ML Data Preparation – E-Commerce Sales Forecasting
Prepared and transformed a raw e-commerce transaction dataset (200 records × 10 features) to be fully ready for training a Sales Forecasting machine learning model.
What was done:
Data Cleaning: Detected and treated 32 outliers across Qty, Price, and Discount columns using the IQR Capping method — preserving all 200 rows with zero data loss
Missing Values: Conducted a full data quality audit — dataset was confirmed 100% complete with no imputation needed
Feature Engineering: Created 9 new features including the target variable Revenue = Qty × Price × (1 – Discount), plus time-based features (Month, DayOfWeek, DayOfMonth)
Encoding: Applied Label Encoding to Category (3 classes) and City (5 cities), Binary Encoding to Return, and Frequency Encoding to Product
Scaling: Applied StandardScaler (μ=0, σ=1) to 9 numeric columns for optimal model performance
Deliverables: 4-sheet Excel workbook — Raw Data, Cleaned Data, ML-Ready Data, and a full Quality Report
Tools Used: Python · Pandas · NumPy · Scikit-learn · Matplotlib · OpenPyXL · Jupyter Notebook
Result: Dataset expanded from 10 raw columns to 17 engineered, fully numeric, model-ready features — with Revenue as the recommended Target Variable and Month, Category, City, Price, and Qty as the strongest predictors.