It represents an end-to-end machine learning pipeline for predicting the risk of heart disease using clinical and demographic data.
Data Preparation & Transformation
• Cleaned and standardized the dataset, addressing missing values and anomalies.
• Applied scaling to numerical features and encoding to categorical ones.
• Extracted meaningful features through PCA, statistical tests, and model-based importance ranking.
• Identified 13 critical clinical attributes (e.g., cholesterol, age, exercise-induced angina).
Model Development & Optimization
• Built and evaluated multiple algorithms: Logistic Regression, Decision Tree, Random Forest, and Support Vector Machine.
• Achieved 87%+ accuracy, with improved recall and F1-score after tuning.
• Enhanced models using GridSearchCV and RandomizedSearchCV for hyperparameter optimization.
• Explored clustering (K-Means, Hierarchical) to uncover hidden patient patterns.
Performance Analysis
• Assessed models using Accuracy, Precision, Recall, F1-score, and ROC AUC.
• Visualized insights via ROC/PR curves, confusion matrices, and PCA plots.
• Found maximum heart rate during exercise and ST depression as top predictors.
Deployment Readiness
• Exported the final optimized model (final_model.pkl) with metadata and evaluation reports.
• Built a modular ML pipeline ensuring reproducibility and prevention of data leakage.
• Designed a prototype Streamlit app and documented deployment with Ngrok for secure accessibility.
Impact: The project highlights the power of machine learning in healthcare by integrating robust preprocessing, advanced feature engineering, supervised & unsupervised learning, and deployment strategies into a reproducible and explainable prediction system.
For more information, please check the attached materials !!
This project was developed as part of the Sprints x Microsoft Summer Camp (AI & ML). It represents an end-to-end machine learning pipeline for predicting the risk of heart disease using clinical and demographic data. Data Preparation & Transformation • Cleaned and standardized the dataset, addressing missing values and anomalies. • Applied scaling to numerical features and encoding to categorical ones. • Extracted meaningful features through PCA, statistical tests, and model-based importance ranking. • Identified 13 critical clinical attributes (e.g., cholesterol, age, exercise-induced angina). Model Development & Optimization • Built and evaluated multiple algorithms: Logistic Regression, Decision Tree, Random Forest, and Support Vector Machine. • Achieved 87%+ accuracy, with improved recall and F1-score after tuning. • Enhanced models using GridSearchCV and RandomizedSearchCV for hyperparameter optimization. • Explored clustering (K-Means, Hierarchical) to uncover hidden patient patterns. Performance Analysis • Assessed models using Accuracy, Precision, Recall, F1-score, and ROC AUC. • Visualized insights via ROC/PR curves, confusion matrices, and PCA plots. • Found maximum heart rate during exercise and ST depression as top predictors. Deployment Readiness • Exported the final optimized model (final_model.pkl) with metadata and evaluation reports. • Built a modular ML pipeline ensuring reproducibility and prevention of data leakage. • Designed a prototype Streamlit app and documented deployment with Ngrok for secure accessibility. Impact: The project highlights the power of machine learning in healthcare by integrating robust preprocessing, advanced feature engineering, supervised & unsupervised learning, and deployment strategies into a reproducible
Skills: Machine Learning · Data Preprocessing & Feature Engineering · Random Forest & Model Optimization · Unsupervised Learning / Clustering · Deployment & Web Applications · Programming & Tools · Data Science & Processing · Program Management · Data Handling