This project involved a complete data pipeline process, from raw data ingestion to a fully structured and operational database. The primary objective was to clean, transform, and migrate vehicle and dealership data from a flat-file source into a robust, relational MySQL database.
My key responsibilities and the process included:
Data Cleansing & Transformation: I utilized Power Query to perform extensive data cleaning on the raw dataset. This included standardizing formats, handling missing values, correcting data types, and ensuring data integrity before migration.
Database Design: Using MySQL Workbench, I designed a comprehensive and normalized relational database schema. The schema was structured to efficiently store information about vehicles, dealerships, sales, customer contacts, and specific vehicle features, establishing clear relationships between them.
Automated Data Migration: I developed a custom Python script to automate the migration process. The script uses the Pandas library for efficient data manipulation and a MySQL connector to load the cleaned data into the database in batches, ensuring a stable and error-free transfer.
Validation: After migration, I ran SQL queries to validate that all data was accurately and successfully populated into the new database structure.
The result is a clean, well-organized, and query-able database that serves as a reliable single source of truth for the company's vehicle and sales operations.