تفاصيل العمل

Data Joker Preprocessor is an interactive Python GUI tool for robust data preprocessing and exploratory data analysis, designed to help you prepare datasets for machine learning and analytics workflows with minimal code.

Features

Load Data: Supports CSV, Excel, JSON, and Parquet files with robust encoding and error handling.

Initial Analysis: Get a summary of data shape, memory usage, missing values, duplicates, constant columns, and column types.

Target Identification: Easily select your target column for supervised learning or skip for unsupervised tasks.

Data Cleaning:

Handle missing values (drop rows/columns, fill with mean/median/mode/custom value).

Remove duplicates.

Handle outliers (remove, cap, or transform).

Feature Engineering:

Feature scaling (StandardScaler, MinMaxScaler).

Encode categorical columns (choose One-Hot or Label encoding per column).

Drop columns interactively.

Analysis:

Correlation analysis and heatmap visualization.

Statistical summary of numerical and categorical columns.

Export:

Export processed data in various formats (CSV, Excel, JSON, Parquet).

Export the entire preprocessing pipeline as executable Python code.

Save and load preprocessing steps for reproducibility.

User-Friendly GUI:

Scrollable control panel and results area.

Data preview with scrollable table.

Step-by-step tracking of all preprocessing actions.

Installation

Clone or download this repository.

Install dependencies:

pip install -r requirements.txt

Run the application:

python improved_gui.py

Usage

Load your dataset using the "Load Data" button.

Explore and clean your data using the provided controls.

Engineer features and encode categorical columns as needed.

Analyze your data with correlation and summary tools.

Export your cleaned data or the preprocessing pipeline for reuse.

Notes

The tool is designed for tabular data and supports most common file formats.

All preprocessing steps are tracked and can be exported as a reproducible Python script.

The GUI is built with Tkinter and supports large datasets with scrollable views.

Requirements

See requirements.txt for the full list of dependencies.

License

This project is for educational and research purposes.

**Enjoy fast, interactive, and reproducible data preprocessing with

بطاقة العمل

اسم المستقل
عدد الإعجابات
0
عدد المشاهدات
7
تاريخ الإضافة