End-to-End Customer Churn Prediction Pipeline with SSIS, Data Warehouse, and Airflow Orchestration

تفاصيل العمل

This project presents a complete end-to-end data engineering and machine learning pipeline for customer churn prediction, designed to simulate a real-world production environment.

The pipeline begins with raw data generation and preprocessing using Python, where data cleaning, transformation, and feature preparation are applied. The processed data is then loaded into a relational database (OLTP layer).

An ETL process is implemented using SQL Server Integration Services (SSIS) to move and transform data into a Data Warehouse. A star schema design is used, including multiple dimension tables (Customer, Services, Contract, Payment) and a central fact table.

Advanced ETL techniques are applied, such as:

- Data type conversion and cleansing

- Lookup transformations for surrogate key mapping

- Slowly Changing Dimensions (SCD) handling for managing updates and new records

The Data Warehouse serves as the main source for feature extraction, where SQL queries are used to prepare model-ready datasets.

A machine learning model is then built using Python and Scikit-learn to predict customer churn based on historical data. The predictions are written back into the Data Warehouse (fact table) as churn scores to support analytics and business decision-making.

To ensure automation and scalability, Apache Airflow is used to orchestrate the entire workflow. Airflow schedules and manages the execution of all pipeline components, including preprocessing scripts, SSIS package execution, feature extraction, and machine learning prediction, creating a fully automated and production-like data pipeline.

Key Features:

- End-to-End Data Pipeline (Raw Data → ETL → Data Warehouse → ML → Predictions)

- Star Schema Data Modeling

- SSIS-based ETL with Lookup and SCD implementation

- Feature Engineering and Churn Prediction Model

- Workflow Orchestration using Apache Airflow

- Production-ready architecture design

بطاقة العمل

اسم المستقل
عدد الإعجابات
0
عدد المشاهدات
2
تاريخ الإضافة
تاريخ الإنجاز