End-to-end data engineering project designed for a telecom environment to process high-frequency network transaction logs and load them into a structured SQL Server Data Warehouse.
The system processes batch files generated every 5 minutes from cell towers, applies data validation and business transformation rules, and separates clean and erroneous records for reliable analytics and auditing.
Key Workflow:
Data Extraction: Reads raw pipe-delimited transaction files from network logs.
Data Transformation (Python):
Validates required fields and timestamp formats
Extracts TAC and SNR from IMEI numbers
Enriches data using IMSI-to-Subscriber mapping
Routes records into valid and error streams
Archives processed files to prevent duplication
Data Loading (SQL Server):
Uses BULK INSERT for high-performance loading
Stores clean data in a fact table for analytics
Stores rejected records in an error table with detailed audit information
Tech Stack:
Python (standard libraries only), Microsoft SQL Server, T-SQL, Dimensional Modeling, Batch ETL Processing
This solution ensures data integrity, scalability, and efficient handling of high-volume telecom transaction data for reporting and analysis