we focused on building a scalable data pipeline that extracts, transforms, and loads (ETL) data into a centralized data
warehouse
We processed various data formats like CSV, XML, and JSON using powerful tools such as SQL, Python, Hadoop, and
Apache Spark to handle big data.
We utilized Hadoop to manage and store large datasets efficiently, while Apache Spark enabled us to process and
analyze this data quickly.
By leveraging these big data technologies, we transformed raw data into actionable insights.
Finally, we visualized these insights using Python and Power BI, making the data easy to understand and useful for
decision-making.