This project focuses on predicting used car prices using the **XGBoost regression algorithm**. It involves the following steps:
1. **Library Imports**: Necessary libraries for machine learning, data handling, and visualization are imported.
2. **Data Loading**: A dataset of used cars is loaded from a CSV file into a pandas DataFrame.
3. **Data Cleaning**:
- Rows with missing values in critical features (like `Price` and `EngineV`) are removed.
- Unrealistic engine values are filtered out (engine volume between 0.6 and 6.5).
- Duplicates are dropped to avoid redundancy.
4. **Feature Engineering**:
- A new feature, `Car_Age`, is created by subtracting the car's manufacturing year from the current year (2024).
5. **Feature and Target Selection**:
- Features like `Mileage`, `EngineV`, `Brand`, and `Car_Age` are selected as predictors (`X`).
- The target variable is `Price` (`y`).
6. **Preprocessing Pipeline**:
- Categorical features are encoded using **OneHotEncoder**.
- A **ColumnTransformer** ensures only categorical data is transformed, while numeric data remains unchanged.
7. **Model Pipeline**:
- An **XGBoost regressor** with 100 estimators is combined with the preprocessing steps into a pipeline.
8. **Model Evaluation**:
- The model is evaluated using 5-fold cross-validation, with **R² scores** calculated for performance assessment.
9. **Model Training and Prediction**:
- The model is trained on the dataset, and predictions for car prices are generated.
10. **Visualization**:
- A scatter plot visualizes the relationship between actual and predicted car prices, highlighting model accuracy.
The project applies machine learning techniques to predict car prices, ensuring accuracy and interpretability through cross-validation and visual analysis.
اسم المستقل | Mostafa H. |
عدد الإعجابات | 0 |
عدد المشاهدات | 8 |
تاريخ الإضافة | |
تاريخ الإنجاز |