Built an end-to-end machine learning pipeline to predict customer churn for a telecommunications company using a real-world dataset (IBM Telco Customer Churn).
• Data Cleaning & Preprocessing — handled missing values, dropped irrelevant features, and encoded binary columns
• Feature Engineering — applied Target Encoding for high-cardinality categorical variables
• Feature Selection — used Chi-Square test for categorical features and ANOVA F-test for numerical features to select the most predictive variables
• Model Development — trained and compared 3 classification models: Logistic Regression, Decision Tree, and Random Forest
• Hyperparameter Tuning — optimized each model using GridSearchCV with 5-Fold Cross Validation
• Model Evaluation — assessed performance using Precision, Recall, F1-Score, and Classification Report