تفاصيل العمل

It represents an end-to-end machine learning pipeline for predicting the risk of heart disease using clinical and demographic data.

Data Preparation & Transformation

• Cleaned and standardized the dataset, addressing missing values and anomalies.

• Applied scaling to numerical features and encoding to categorical ones.

• Extracted meaningful features through PCA, statistical tests, and model-based importance ranking.

• Identified 13 critical clinical attributes (e.g., cholesterol, age, exercise-induced angina).

Model Development & Optimization

• Built and evaluated multiple algorithms: Logistic Regression, Decision Tree, Random Forest, and Support Vector Machine.

• Achieved 87%+ accuracy, with improved recall and F1-score after tuning.

• Enhanced models using GridSearchCV and RandomizedSearchCV for hyperparameter optimization.

• Explored clustering (K-Means, Hierarchical) to uncover hidden patient patterns.

Performance Analysis

• Assessed models using Accuracy, Precision, Recall, F1-score, and ROC AUC.

• Visualized insights via ROC/PR curves, confusion matrices, and PCA plots.

• Found maximum heart rate during exercise and ST depression as top predictors.

Deployment Readiness

• Exported the final optimized model (final_model.pkl) with metadata and evaluation reports.

• Built a modular ML pipeline ensuring reproducibility and prevention of data leakage.

• Designed a prototype Streamlit app and documented deployment with Ngrok for secure accessibility.

Impact: The project highlights the power of machine learning in healthcare by integrating robust preprocessing, advanced feature engineering, supervised & unsupervised learning, and deployment strategies into a reproducible and explainable prediction system.

For more information, please check the attached materials !!

This project was developed as part of the Sprints x Microsoft Summer Camp (AI & ML). It represents an end-to-end machine learning pipeline for predicting the risk of heart disease using clinical and demographic data. Data Preparation & Transformation • Cleaned and standardized the dataset, addressing missing values and anomalies. • Applied scaling to numerical features and encoding to categorical ones. • Extracted meaningful features through PCA, statistical tests, and model-based importance ranking. • Identified 13 critical clinical attributes (e.g., cholesterol, age, exercise-induced angina). Model Development & Optimization • Built and evaluated multiple algorithms: Logistic Regression, Decision Tree, Random Forest, and Support Vector Machine. • Achieved 87%+ accuracy, with improved recall and F1-score after tuning. • Enhanced models using GridSearchCV and RandomizedSearchCV for hyperparameter optimization. • Explored clustering (K-Means, Hierarchical) to uncover hidden patient patterns. Performance Analysis • Assessed models using Accuracy, Precision, Recall, F1-score, and ROC AUC. • Visualized insights via ROC/PR curves, confusion matrices, and PCA plots. • Found maximum heart rate during exercise and ST depression as top predictors. Deployment Readiness • Exported the final optimized model (final_model.pkl) with metadata and evaluation reports. • Built a modular ML pipeline ensuring reproducibility and prevention of data leakage. • Designed a prototype Streamlit app and documented deployment with Ngrok for secure accessibility. Impact: The project highlights the power of machine learning in healthcare by integrating robust preprocessing, advanced feature engineering, supervised & unsupervised learning, and deployment strategies into a reproducible

Skills: Machine Learning · Data Preprocessing & Feature Engineering · Random Forest & Model Optimization · Unsupervised Learning / Clustering · Deployment & Web Applications · Programming & Tools · Data Science & Processing · Program Management · Data Handling