تفاصيل العمل

This project focuses on predicting used car prices using the **XGBoost regression algorithm**. It involves the following steps:

1. **Library Imports**: Necessary libraries for machine learning, data handling, and visualization are imported.

2. **Data Loading**: A dataset of used cars is loaded from a CSV file into a pandas DataFrame.

3. **Data Cleaning**:

- Rows with missing values in critical features (like `Price` and `EngineV`) are removed.

- Unrealistic engine values are filtered out (engine volume between 0.6 and 6.5).

- Duplicates are dropped to avoid redundancy.

4. **Feature Engineering**:

- A new feature, `Car_Age`, is created by subtracting the car's manufacturing year from the current year (2024).

5. **Feature and Target Selection**:

- Features like `Mileage`, `EngineV`, `Brand`, and `Car_Age` are selected as predictors (`X`).

- The target variable is `Price` (`y`).

6. **Preprocessing Pipeline**:

- Categorical features are encoded using **OneHotEncoder**.

- A **ColumnTransformer** ensures only categorical data is transformed, while numeric data remains unchanged.

7. **Model Pipeline**:

- An **XGBoost regressor** with 100 estimators is combined with the preprocessing steps into a pipeline.

8. **Model Evaluation**:

- The model is evaluated using 5-fold cross-validation, with **R² scores** calculated for performance assessment.

9. **Model Training and Prediction**:

- The model is trained on the dataset, and predictions for car prices are generated.

10. **Visualization**:

- A scatter plot visualizes the relationship between actual and predicted car prices, highlighting model accuracy.

The project applies machine learning techniques to predict car prices, ensuring accuracy and interpretability through cross-validation and visual analysis.

ملفات مرفقة

بطاقة العمل

اسم المستقل Mostafa H.
عدد الإعجابات 0
عدد المشاهدات 8
تاريخ الإضافة
تاريخ الإنجاز

المهارات المستخدمة