Data Joker Preprocessor is an interactive Python GUI tool for robust data preprocessing and exploratory data analysis, designed to help you prepare datasets for machine learning and analytics workflows with minimal code.
Features
Load Data: Supports CSV, Excel, JSON, and Parquet files with robust encoding and error handling.
Initial Analysis: Get a summary of data shape, memory usage, missing values, duplicates, constant columns, and column types.
Target Identification: Easily select your target column for supervised learning or skip for unsupervised tasks.
Data Cleaning:
Handle missing values (drop rows/columns, fill with mean/median/mode/custom value).
Remove duplicates.
Handle outliers (remove, cap, or transform).
Feature Engineering:
Feature scaling (StandardScaler, MinMaxScaler).
Encode categorical columns (choose One-Hot or Label encoding per column).
Drop columns interactively.
Analysis:
Correlation analysis and heatmap visualization.
Statistical summary of numerical and categorical columns.
Export:
Export processed data in various formats (CSV, Excel, JSON, Parquet).
Export the entire preprocessing pipeline as executable Python code.
Save and load preprocessing steps for reproducibility.
User-Friendly GUI:
Scrollable control panel and results area.
Data preview with scrollable table.
Step-by-step tracking of all preprocessing actions.
Installation
Clone or download this repository.
Install dependencies:
pip install -r requirements.txt
Run the application:
python improved_gui.py
Usage
Load your dataset using the "Load Data" button.
Explore and clean your data using the provided controls.
Engineer features and encode categorical columns as needed.
Analyze your data with correlation and summary tools.
Export your cleaned data or the preprocessing pipeline for reuse.
Notes
The tool is designed for tabular data and supports most common file formats.
All preprocessing steps are tracked and can be exported as a reproducible Python script.
The GUI is built with Tkinter and supports large datasets with scrollable views.
Requirements
See requirements.txt for the full list of dependencies.
License
This project is for educational and research purposes.
**Enjoy fast, interactive, and reproducible data preprocessing with