I worked on a sales dataset that initially had multiple quality issues, which required careful data cleaning and validation before meaningful analysis could be done.
Quantity Issues:
There were 14 products with missing quantity values. To solve this, I performed a detailed review of similar product records and matched them against actual sales records. The missing quantities were replaced with the correct values, and duplicate records representing the same products were consolidated to ensure consistency.
Price Adjustment:
For items where the quantity was missing but the sales amount was recorded, I recalculated and imputed the correct quantities using the actual unit price. This ensured data accuracy without distorting total revenue figures.
Shipping Date Problem:
Another critical issue was with the shipping dates, where some records did not properly reflect the actual delivery period to the customer. After cleaning and adjusting these timestamps, we were able to correctly calculate delivery times and detect delays.
After cleaning, I built an interactive dashboard (using Power BI/Excel) to visualize:
Sales performance by product and category.
The corrected order quantities and revenues.
Delivery performance based on the fixed shipping dates.
This process not only improved the accuracy and reliability of the dataset but also helped reveal hidden insights, such as which products had recurring data entry issues and how shipping delays impacted customer experience.