Data Engineering project that builds an end-to-end pipeline for processing jewelry dataset using Python and Apache Airflow.
The pipeline automates the full data workflow starting from raw CSV ingestion, loading data into PostgreSQL, performing data cleaning and transformation, and exporting a cleaned dataset for further analysis.
The project demonstrates core data engineering concepts including workflow orchestration, data validation, ETL processing, and automated analytics generation. It also includes visualizations to explore patterns in jewelry pricing, categories, and product attributes.
Key components of the project include:
Automated ETL pipeline using Apache Airflow
Data ingestion from CSV files
Data storage and processing using PostgreSQL
Data cleaning and transformation with Pandas
Exporting cleaned datasets for analytics
Generating charts and insights for exploratory data analysis
This project demonstrates practical skills in building scalable and maintainable data pipelines suitable for real-world analytics workflows.