تفاصيل العمل

I’ve just completed a new Data Engineering task as part of my training journey!

The scenario was working as a Data Engineer in one of the biggest markets in Egypt, where the data sources were:

Products stored in a JSON file (id, name, price)

Transactions stored in multiple CSV files (1,000 transactions per file → starting with 12,000 transactions = 3 days of data)

Customers stored in a SQL Server database table (2,000 customers)

My task was to build and manage an Orders table:

(transaction_id, product_name, amount, customer_full_name, transaction_date, branch)

Key steps I implemented:

1️⃣ Initial Load → Inserted the first 12,000 transactions into the Orders table.

2️⃣ Delta Load → Inserted the new day’s transactions (incremental insert).

3️⃣ Product Price Update → Applied changes to reflect the updated product price in the Orders table using an UPDATE statement.

4️⃣ Change Data Capture (CDC) → On day 3, transaction files included both modified old transactions and new ones. I used a MERGE statement to update existing records and insert new ones.

Skills gained & applied:

ETL processes using Pandas & SQL Server

Data integration from JSON, CSV, and SQL DB

Handling incremental & CDC scenarios in real-world datasets

بطاقة العمل

اسم المستقل
عدد الإعجابات
0
تاريخ الإضافة