تفاصيل العمل

# ? Vehicle Price Prediction Engine

## ? Project Overview

In the highly competitive used car market, accurate pricing is critical for both buyers and dealerships. This project presents an end-to-end Machine Learning pipeline designed to predict the selling price of used vehicles based on various technical and market features (e.g., kilometers driven, fuel type, engine capacity, and max power).

**Key Achievement:** Successfully engineered a highly accurate predictive model, achieving an outstanding **$R^2$ Score of 97.5%**, completely transforming raw, messy market data into actionable business intelligence.

## ?️ Tech Stack & Tools

* **Language:** Python

* **Data Processing & Cleaning:** Pandas, NumPy, Regular Expressions (RegEx)

* **Machine Learning:** Scikit-Learn (Random Forest Regressor, Linear Regression)

* **Evaluation Metrics:** $R^2$ Score, RMSE, Cross-Validation

## ? The Machine Learning Pipeline

### 1. Advanced Data Extraction & Cleaning

Real-world data is rarely clean. The original dataset contained complex string values mixed with text (e.g., `'1248 CC'`, `'73.94 bhp'`, `'23.2 kmpl'`).

* Utilized advanced **RegEx (Regular Expressions)** and string manipulation techniques to extract pure numerical features.

* Handled missing values (NaNs) and dropped irrelevant columns to ensure data integrity.

* Applied **One-Hot Encoding** to convert categorical variables (Fuel, Seller Type, Transmission, Owner) into a machine-readable format.

### 2. Model Selection & Training

To find the optimal solution, I evaluated multiple algorithms:

* **Baseline Model:** A standard Linear Regression model struggled to capture the complex, non-linear relationships in car pricing, yielding an $R^2$ score of approximately ~60%.

* **Advanced Model (Random Forest):** Implemented a `RandomForestRegressor` (with 100 estimators) and scaled the features using `StandardScaler`. This ensemble method effectively captured the complex feature interactions.

### 3. Final Evaluation & Results

The Random Forest model outperformed expectations:

* **Final Accuracy ($R^2$ Score): 97.5%**

* **Robustness:** Validated the model using 5-fold Cross-Validation, resulting in consistently high scores across all folds (Average Accuracy: ~96.2%), proving the model is robust and not overfitting.

## ? Business Value

This predictive engine demonstrates the ability to take raw, unstructured scraped data from classified websites, clean it efficiently, and deploy a machine learning model that can automatically and accurately estimate market prices. This is directly applicable to real estate pricing, e-commerce dynamic pricing, and inventory valuation.

بطاقة العمل

اسم المستقل
عدد الإعجابات
0
عدد المشاهدات
1
تاريخ الإضافة
المهارات