Developed a machine learning model to predict students’ exam scores based on various academic and socio-economic factors. The project involved building a complete data science pipeline, starting from data preprocessing to model evaluation and optimization.
I performed data cleaning and preprocessing, including handling missing values, encoding categorical variables, and feature scaling. Exploratory Data Analysis (EDA) was conducted using Pandas, NumPy, Matplotlib, and Seaborn to identify patterns and correlations between features such as study time, parental education level, and previous scores.
Multiple regression models were implemented, including Linear Regression, Decision Trees, and Random Forest, to predict exam performance. Model performance was evaluated using metrics such as Mean Absolute Error (MAE), Mean Squared Error (MSE), and R² score. Hyperparameter tuning and cross-validation were applied to improve accuracy and reduce overfitting.
The final model achieved reliable prediction performance and demonstrated the ability to extract meaningful insights from data, highlighting key factors that influence student success. This project strengthened my skills in data preprocessing, feature engineering, model selection, and evaluation within a real-world predictive analytics scenario.