Machine learning classification and data cleaning

تفاصيل العمل

my project involves several steps of data processing and analysis, starting with loading a dataset , handling duplicates, missing values, and applying transformations such as encoding and normalization. Here's an overview of the task:

1. **Data Cleaning and Preparation**:

- **Duplicate Handling**: Checks for duplicate rows in the dataset.

- **Missing Values**: Fills missing values and exports the filled data.

- **Encoding**: Converts categorical values (like `bechdelRating`) into dummy/indicator variables for easier model integration.

- **Normalization**: Applies MinMaxScaler to numerical columns like `runtimeMinutes` and `imdbAverageRating` and visualizes the before/after normalization with histograms.

2. **Classification**:

- **Rating Classification**: A custom function classifies IMDb ratings into categories ('Bad', 'Good', 'Excellent') based on set thresholds.

- **Decision Tree Classifier**: Splits the data into training and test sets, trains a decision tree model, and evaluates it using accuracy, precision, recall, and F1 score. The results are saved to a file.

3. **Regression**:

- **Linear Regression**: Trains a linear regression model to predict IMDb ratings based on features like `runtimeMinutes` and `year`. The performance is evaluated using RMSE (Root Mean Squared Error) and MAE (Mean Absolute Error), and the predictions are visualized with a scatter plot of actual vs. predicted values.

Each step of the process ensures the data is cleaned, transformed, and modeled to provide insights into both classification and regression tasks.

ملفات مرفقة

- XLSX
- classification11.xlsx
- (20.9KB)
- XLSX
- normalizeddata-2.xlsx
- (906.93KB)

بطاقة العمل

اسم المستقل	Basmala T.
عدد الإعجابات	0
عدد المشاهدات	11
تاريخ الإضافة	13/10/2024
تاريخ الإنجاز	16/04/2024

Machine learning classification and data cleaning

تفاصيل العمل

ملفات مرفقة

بطاقة العمل

المهارات المستخدمة

شارك العمل

روابط

تابع مستقل على

وسائل الدفع المتاحة