University and Major Recommendation System for Algerian Students

تفاصيل العمل

Abstract

This project aims to build a recommendation system to predict the most appropriate university and major for students who have obtained their baccalaureate degree in Algeria. Using data collected from 23,195 students, we built two Random Forest models: one for predicting the major and another for predicting the university. The data underwent extensive preprocessing, including cleaning, normalization, and visualization, before selecting Random Forest as the final model due to its superior performance over other algorithms like SVM.

Introduction

Selecting the right university and major is crucial for students' future careers and personal satisfaction. This project addresses the challenge by developing a recommendation system tailored for Algerian students. We collected data via a survey, analyzed it, and built machine learning models to make accurate recommendations.

Dataset Description

The dataset consists of responses from 23,195 Algerian students who are currently pursuing their studies at various universities. The survey includes questions about personal demographics, academic performance, and preferences related to university and major choices.

Data Cleaning: Removed missing values and handled inconsistencies.

Normalization: Scaled continuous features like 'Bac Mark' for uniformity.

Data Visualization: Generated plots to understand distributions and relationships in the data.

Model Selection

After experimenting with various machine learning algorithms, including SVM and others, Random Forest was chosen due to its superior performance. Two Random Forest models were trained: one for predicting the major and another for predicting the university.

Model Evaluation

Metrics Used: Accuracy, Precision, Recall, F1-Score

Model Performance:

Major Prediction: Achieved an accuracy of 96%

Conclusion

The Random Forest models provide reliable recommendations for both majors and universities for Algerian students. This system can assist students in making informed decisions based on their personal preferences and academic performance.

Future Work

Expansion of Dataset: Collect more data to improve model accuracy.

Incorporation of Additional Features: Include more features like extracurricular activities and family background.

Model Improvement: Explore advanced models and ensemble techniques to enhance prediction accuracy.