# ? Vehicle Price Prediction Engine
## ? Project Overview
In the highly competitive used car market, accurate pricing is critical for both buyers and dealerships. This project presents an end-to-end Machine Learning pipeline designed to predict the selling price of used vehicles based on various technical and market features (e.g., kilometers driven, fuel type, engine capacity, and max power).
**Key Achievement:** Successfully engineered a highly accurate predictive model, achieving an outstanding **$R^2$ Score of 97.5%**, completely transforming raw, messy market data into actionable business intelligence.
## ?️ Tech Stack & Tools
* **Language:** Python
* **Data Processing & Cleaning:** Pandas, NumPy, Regular Expressions (RegEx)
* **Machine Learning:** Scikit-Learn (Random Forest Regressor, Linear Regression)
* **Evaluation Metrics:** $R^2$ Score, RMSE, Cross-Validation
## ? The Machine Learning Pipeline
### 1. Advanced Data Extraction & Cleaning
Real-world data is rarely clean. The original dataset contained complex string values mixed with text (e.g., `'1248 CC'`, `'73.94 bhp'`, `'23.2 kmpl'`).
* Utilized advanced **RegEx (Regular Expressions)** and string manipulation techniques to extract pure numerical features.
* Handled missing values (NaNs) and dropped irrelevant columns to ensure data integrity.
* Applied **One-Hot Encoding** to convert categorical variables (Fuel, Seller Type, Transmission, Owner) into a machine-readable format.
### 2. Model Selection & Training
To find the optimal solution, I evaluated multiple algorithms:
* **Baseline Model:** A standard Linear Regression model struggled to capture the complex, non-linear relationships in car pricing, yielding an $R^2$ score of approximately ~60%.
* **Advanced Model (Random Forest):** Implemented a `RandomForestRegressor` (with 100 estimators) and scaled the features using `StandardScaler`. This ensemble method effectively captured the complex feature interactions.
### 3. Final Evaluation & Results
The Random Forest model outperformed expectations:
* **Final Accuracy ($R^2$ Score): 97.5%**
* **Robustness:** Validated the model using 5-fold Cross-Validation, resulting in consistently high scores across all folds (Average Accuracy: ~96.2%), proving the model is robust and not overfitting.
## ? Business Value
This predictive engine demonstrates the ability to take raw, unstructured scraped data from classified websites, clean it efficiently, and deploy a machine learning model that can automatically and accurately estimate market prices. This is directly applicable to real estate pricing, e-commerce dynamic pricing, and inventory valuation.