This project analyzes the Student Performance dataset (UCI Repository) to understand factors affecting academic success. We applied data cleaning, EDA, clustering (K-Means), and supervised learning (Logistic Regression, Decision Tree, Random Forest) to predict whether a student will pass or fail.
The goal is to:
Identify key risk factors for student failure.
Compare model performance with and without data leakage (G1 & G2).
Provide actionable recommendations for early interventions.