Project Description
This project implements a comprehensive machine learning solution for predicting rainfall in Australia using the weatherAUS dataset. The system analyzes various meteorological features to forecast whether it will rain tomorrow, providing valuable insights for weather prediction applications.
Dataset Overview
Source: weatherAUS.csv (Australian Weather Dataset)
Target Variable: RainTomorrow (Binary classification: Yes/No)
Features: 20+ meteorological variables including:
Temperature measurements (MinTemp, MaxTemp, Temp9am, Temp3pm)
Wind data (WindGustSpeed, WindSpeed9am, WindSpeed3pm, Wind directions)
Humidity levels (Humidity9am, Humidity3pm)
Atmospheric pressure (Pressure9am, Pressure3pm)
Cloud coverage and rainfall measurements
Location and temporal features (Month, Day)
Technical Implementation
Data Preprocessing
Feature Engineering: Extracted temporal features (month, day) from date information
Missing Data Handling: Removed columns with >40% missing values
Encoding: Converted categorical variables (RainToday, RainTomorrow) to binary format
Pipeline Architecture: Implemented robust preprocessing pipeline with:
Median imputation and StandardScaler for numerical features
Constant imputation and OneHotEncoder for categorical features
Machine Learning Models
Implemented and evaluated 5 different classification algorithms:
Logistic Regression - Accuracy: 99.85%
K-Nearest Neighbors (k=5) - Accuracy: 87.56%
Gaussian Naive Bayes - Accuracy: 79.63%
Decision Tree - Accuracy: 100%
Random Forest - Accuracy: 100%
Model Evaluation
Train-test split: 80/20 with stratification
Comprehensive metrics: Accuracy, Precision, Recall, F1-Score
Confusion matrices for detailed performance analysis
Classification reports for class-wise evaluation
Key Results
The ensemble methods (Random Forest and Decision Tree) achieved perfect accuracy on the test set, demonstrating excellent predictive capability. Logistic Regression also showed exceptional performance at 99.85% accuracy, providing a simpler yet highly effective alternative.
Technologies Used
Python 3.11
Libraries:
pandas, numpy (Data manipulation)
scikit-learn (Machine learning models and preprocessing)
Jupyter Notebook (Development environment)
Deliverables
Complete Jupyter Notebook with all code and analysis
Model accuracy summary report (CSV format)
Trained pipeline ready for deployment
Comprehensive evaluation metrics and visualizations
Project Applications
This weather prediction system can be utilized for:
Agricultural planning and crop management
Outdoor event planning
Transportation and logistics optimization
Emergency preparedness and disaster management
General weather forecasting services