تفاصيل العمل

A multi-stage data extraction and transformation pipeline that combines JavaScript and Python to scrape and structure global country profiles.

Instead of a basic scraping script, this project implements a resilient data pipeline that handles dynamic content rendering and intermediate structured storage.

Key Features & Architecture:

- Dynamic Extraction: Used Playwright with JavaScript to inject scripts and extract raw demographic profiles (Country Names, Capitals, Population, and Total Area).

- Intermediate Storage: Saved the raw extracted data into structured JSON files to prevent data loss.

- Python Transformation: Built a Python post-processing script using Pandas to read the JSON data, clean it, and reorganize it into a production-ready Excel (.xlsx) file.

Technologies Used:

- Playwright (JavaScript)

- Python

- Pandas

- JSON & Excel (XLSX)

بطاقة العمل

اسم المستقل
عدد الإعجابات
0
تاريخ الإضافة
تاريخ الإنجاز
المهارات