تفاصيل العمل

SQL Data Warehouse Project

A comprehensive SQL-based data warehouse implementation following the Medallion Architecture (Bronze, Silver, Gold layers) for analytics and business intelligence. This project demonstrates best practices for data ingestion, transformation, and analytical modeling using dimensional modeling techniques.

? Table of Contents

Overview

Architecture

Prerequisites

Project Structure

Getting Started

Data Layers

Data Model

Usage

Testing

Documentation

License

? Overview

This project implements a modern data warehouse solution that:

Ingests data from multiple source systems (CRM and ERP)

Applies data quality transformations

Creates analytics-ready dimensional models

Supports business intelligence and reporting use cases

The data warehouse processes customer, product, and sales information to enable comprehensive sales analytics.

?️ Architecture

The project follows the Medallion Architecture pattern with three distinct layers:

┌─────────────┐

│ Sources │ (CRM & ERP Systems)

└──────┬──────┘

┌─────────────┐

│ Bronze │ (Raw Data Ingestion)

│ Layer │ - crm_sales_details

│ │ - crm_prd_info

│ │ - crm_cust_info

│ │ - erp_px_cat_g1v2

│ │ - erp_loc_a101

│ │ - erp_cust_az12

└──────┬──────┘

┌─────────────┐

│ Silver │ (Cleaned & Standardized)

│ Layer │ - Data type conversions

│ │ - Date formatting

│ │ - Audit columns added

└──────┬──────┘

┌─────────────┐

│ Gold │ (Analytics-Ready)

│ Layer │ - dim_customers

│ │ - dim_products

│ │ - fact_sales

└─────────────┘

Layer Responsibilities

Bronze Layer: Raw data ingestion with minimal transformation

Silver Layer: Data cleansing, standardization, and type casting

Gold Layer: Business-level dimensional models (Star Schema)

? Prerequisites

Microsoft SQL Server (2016 or later)

SQL Server Management Studio (SSMS) or Azure Data Studio

Access to source data files:

/data/source_crm/prd_info.csv

/data/source_crm/sales_details.csv

/data/source_crm/cust_info.csv

/data/source_erp/px_cat_g1v2.csv

/data/source_erp/loc_a101.csv

/data/source_erp/cust_az12.csv

? Project Structure

sql-data-warehouse-project/

├── scripts/

│ ├── bronze/

│ │ ├── creating_ddl.sql # Create Bronze layer tables

│ │ └── load_bronze.sql # Load data from source files

│ ├── silver/

│ │ ├── ddl_silver.sql # Create Silver layer tables

│ │ └── transform_silver.sql # Transform Bronze → Silver

│ └── gold/

│ └── final_ready_to_analitics.sql # Create Gold layer views

├── tests/

│ └── gol_layer_test.sql # Data quality tests for Gold layer

├── docs/

│ └── data_catalog.md # Detailed data dictionary

├── LICENSE

└── README.md

? Getting Started

Step 1: Create Database Schemas

Create three schemas in your SQL Server database:

CREATE SCHEMA bronze;

CREATE SCHEMA silver;

CREATE SCHEMA gold;

Step 2: Setup Bronze Layer

Run the Bronze layer scripts in order:

-- 1. Create Bronze tables

EXEC sp_executesql @sql = N'$(cat scripts/bronze/creating_ddl.sql)';

-- 2. Load data from source files

EXEC bronze.load_bronze;

Step 3: Setup Silver Layer

Run the Silver layer scripts:

-- 1. Create Silver tables

EXEC sp_executesql @sql = N'$(cat scripts/silver/ddl_silver.sql)';

-- 2. Transform Bronze to Silver

EXEC sp_executesql @sql = N'$(cat scripts/silver/transform_silver.sql)';

Step 4: Setup Gold Layer

Create the Gold layer dimensional model:

-- Create Gold layer views (dimensions and facts)

EXEC sp_executesql @sql = N'$(cat scripts/gold/final_ready_to_analitics.sql)';

Step 5: Verify Installation

Run the test suite to verify data quality:

-- Run Gold layer tests

EXEC sp_executesql @sql = N'$(cat tests/gol_layer_test.sql)';

? Data Layers

Bronze Layer (Raw Data)

Tables in the Bronze layer store raw data from source systems:

bronze.crm_sales_details - Sales transactions from CRM

bronze.crm_prd_info - Product information from CRM

bronze.crm_cust_info - Customer information from CRM

bronze.erp_px_cat_g1v2 - Product categories from ERP

bronze.erp_loc_a101 - Customer location data from ERP

bronze.erp_cust_az12 - Customer demographics from ERP

Silver Layer (Cleaned Data)

Tables in the Silver layer contain cleaned and standardized data:

Type casting (e.g., integer dates → DATE types)

Standardized naming conventions

Added audit columns (dwh_create_date)

Data quality improvements

Gold Layer (Analytics-Ready)

Views in the Gold layer provide business-level dimensional models:

Dimensions

gold.dim_customers - Customer dimension with demographics and location

gold.dim_products - Product dimension with categories and attributes

Facts

gold.fact_sales - Sales transactions with foreign keys to dimensions

? Data Model

The Gold layer implements a Star Schema for optimized analytical queries:

┌──────────────────┐

│ dim_customers │

│ ───────────── │

│ customer_key PK │

│ customer_id │

│ first_name │

│ last_name │

│ country │

│ marital_status │

│ gender │

│ birthdate │

└────────┬─────────┘

┌────────▼─────────┐

│ fact_sales │

│ ────────── │

│ order_number │

│ product_key FK │

│ customer_key FK│

│ order_date │

│ shipping_date │

│ due_date │

│ sales_amount │

│ quantity │

│ price │

└────────┬─────────┘

┌────────▼─────────┐

│ dim_products │

│ ──────────── │

│ product_key PK │

│ product_id │

│ product_name │

│ category │

│ subcategory │

│ cost │

│ product_line │

└──────────────────┘

? Usage

Querying the Data Warehouse

Example analytical queries:

-- Total sales by customer

SELECT

c.first_name,

c.last_name,

SUM(f.sales_amount) as total_sales,

COUNT(f.order_number) as order_count

FROM gold.fact_sales f

JOIN gold.dim_customers c ON f.customer_key = c.customer_key

GROUP BY c.first_name, c.last_name

ORDER BY total_sales DESC;

-- Product performance by category

SELECT

p.category,

p.subcategory,

SUM(f.sales_amount) as total_revenue,

SUM(f.quantity) as units_sold

FROM gold.fact_sales f

JOIN gold.dim_products p ON f.product_key = p.product_key

GROUP BY p.category, p.subcategory

ORDER BY total_revenue DESC;

-- Monthly sales trend

SELECT

FORMAT(f.order_date, 'yyyy-MM') as month,

SUM(f.sales_amount) as monthly_sales,

COUNT(DISTINCT f.order_number) as order_count

FROM gold.fact_sales f

GROUP BY FORMAT(f.order_date, 'yyyy-MM')

ORDER BY month;

? Testing

The project includes data quality tests located in tests/gol_layer_test.sql:

Test Coverage

Uniqueness Tests

Validates customer_key uniqueness in dim_customers

Validates product_key uniqueness in dim_products

Referential Integrity Tests

Ensures all fact_sales records have valid dimension keys

Checks for orphaned records

Running Tests

-- Execute all tests

EXEC sp_executesql @sql = N'$(cat tests/gol_layer_test.sql)';

All test queries should return zero rows if data quality is maintained.

? Documentation

Detailed documentation is available in the docs/ directory:

Data Catalog - Complete data dictionary for the Gold layer, including:

Table descriptions

Column definitions

Data types

Business rules

? Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Development Guidelines

Follow the three-layer architecture pattern

Add appropriate audit columns to Silver layer tables

Document any new tables or views in the data catalog

Include data quality tests for new entities

? License

This project is licensed under the MIT License - see the LICENSE file for details.

Copyright (c) 2026 Ahmed Wael Galal

? Author

Ahmed Wael Galal

? Email: ahmedwaelgalal2@gmail.com

? Support

For questions or issues, please open an issue in the GitHub repository.

Built with ❤️ for data engineering and analytics

بطاقة العمل

اسم المستقل
عدد الإعجابات
0
عدد المشاهدات
16
تاريخ الإضافة