This project applies machine learning techniques to the Titanic Dataset to predict whether a passenger survived or not.
Project Workflow
Data preprocessing and cleaning
Exploratory data analysis
Feature engineering
Model training and comparison
Model evaluation
Data Preprocessing
Removed irrelevant features: PassengerId, Name, Cabin, Ticket
Filled missing values (Age with median, Embarked with mode)
Encoded categorical variables (Sex with label encoding, Embarked with one-hot encoding)
Created a new feature: FamilySize = SibSp + Parch
Handled outliers using the IQR method
Applied feature scaling for some models
Models Used
Logistic Regression, KNN, Decision Tree, Extra Trees, Random Forest, Gradient Boosting, XGBoost, LightGBM, and Support Vector Machine.
Best Model
Support Vector Machine achieved the best performance with about 87% accuracy.
Evaluation
Model performance was evaluated using accuracy, precision, recall, F1-score, and a confusion matrix