تفاصيل العمل

In this project, I worked on the famous Titanic dataset to build a clean and structured data foundation. My role as a Data Engineer was to handle the raw passenger data, manage missing values, and prepare the dataset for predictive modeling by ensuring high data integrity and consistency.

Step-by-Step Engineering Process:

Step 1: Data Ingestion: I loaded the Titanic CSV data into Google Colab, performing an initial audit to understand the schema and identify corrupted or missing entries.

Step 2: Data Cleaning & Imputation: I developed a strategy to handle missing values (like the 'Age' and 'Cabin' columns) using statistical methods to maintain a High-Quality dataset.

Step 3: Feature Transformation: I processed categorical variables (like Gender and Embarkation points) and transformed them into numerical formats suitable for engineering systems.

Step 4: Data Validation & Visualization: I created correlation matrices and distribution charts to validate that the cleaned data correctly reflects the underlying patterns, ensuring it is production-ready.

بطاقة العمل

اسم المستقل
عدد الإعجابات
0
عدد المشاهدات
5
تاريخ الإضافة
تاريخ الإنجاز
المهارات