A Machine Learning project focused on binary classification to predict passenger survival on the Titanic based on available passenger data.
Key Features:
End-to-End ML Pipeline: Covers all stages from data collection to modelling:
Data acquisition
Data cleaning
Exploratory Data Analysis (EDA)
Feature engineering
Feature selection
Model training and evaluation
Smart Data Analysis: Handles missing values, encodes categorical features, and explores relationships between features and survival.
Multiple ML Models: Experiments with models like Logistic Regression, Random Forest, Gradient Boosting to select the best-performing one.
Performance Evaluation: Uses Accuracy, Precision, Recall, F1-score, and ROC-AUC metrics for robust model assessment.
Interpretable Predictions: Identifies feature importance to understand which factors most influence survival.
Implementation Steps:
Load Dataset: Use Titanic passenger data from Kaggle or official sources.
Explore Data: Analyse columns such as Age, Sex, Passenger Class, Sib S p, and Parch.
Data Cleaning:
Handle missing values (e.g., Age, Cabin).
Remove irrelevant columns (e.g., Name, Ticket).
Feature Engineering:
Create new features like family size (Sib Sp + Parch).
Extract titles from passenger names to capture social status.
Data Transformation: Encode categorical features using Label Encoding or One-Hot Encoding.
Train-Test Split: Divide data into training and testing sets.
Model Training: Train and compare multiple classification models to select the best.
Model Evaluation: Assess model performance using multiple metrics to ensure accuracy and reliability.
Prediction: Use the trained model to predict survival for new passengers.
Project Value:
Provides insights into factors influencing passenger survival on the Titanic.
The model can be applied to similar datasets for survival or outcome predictions.
A strong addition to your Portfolio, demonstrating skills in Data Preprocessing, Feature Engineering, Model Selection, and Evaluation.