تفاصيل العمل

I built a complete machine-learning pipeline to predict rainfall tomorrow (weather classification) using the Australian weather dataset. The project focuses on robust preprocessing and model comparison: I handled heavy missing data (column drop + KNN imputation + mode filling), extracted date features, encoded categoricals, removed outliers, and engineered a clean feature set. I then trained and tuned a variety of classical models — Logistic Regression, Decision Tree, Random Forest, Gradient Boosting, XGBoost, AdaBoost, and an MLP — and compared them using cross-validation and test metrics. For the best models I performed hyperparameter search (GridSearchCV), inspected feature importances, and validated results with confusion matrices and error analysis to choose a reliable production candidate.

What I delivered

End-to-end preprocessing: null-value analysis, column filtering (>60% null removal), KNN imputation for numeric gaps, mode filling for categoricals, date → month extraction, label encoding.

Data quality work: exploratory histograms, correlation heatmap, IQR outlier filtering, boxplots for feature distributions.

Multiple models & model selection: baseline Logistic Regression, tree-based ensembles (DecisionTree, RandomForest, GradientBoosting, XGBoost, AdaBoost), and an MLP pipeline with scaling and early stopping.

Model improvement: GridSearchCV for hyperparameter tuning, pipeline usage for scaling + MLP, ensemble strategies and model comparison via accuracy / confusion matrix / other classification metrics.

Reproducible notebook with saved preprocessed dataset and visualization assets.

ملفات مرفقة

بطاقة العمل

اسم المستقل
عدد الإعجابات
0
عدد المشاهدات
41
تاريخ الإضافة
تاريخ الإنجاز
المهارات