تفاصيل العمل

Web Scraping Task

The goal of this task was to perform web scraping on a target webpage to extract various types of data and store them in structured formats (CSV and JSON).

- Features

Extract Headings (h1, h2)

Extract Paragraphs (p)

Extract Lists (li)

Extract Table data (tr, td), grouped by rows

Extract Form field information (field names, input types, default values)

Extract Video link

- Tools & Libraries

Python

requests – fetch webpage content

BeautifulSoup – parse HTML and extract elements

csv – store structured data in CSV files

json – store extracted data in JSON format

- Output Files

CSV Files :

Extract_Text_Data.csv – Combined extracted text in a structured table

Extract_Table_Data.csv – Extracted table data only

JSON Files :

Product_Information.json – Book title, price, stock availability, and button text

Form_Information.json – Field name, input type, and default values

Video_Link.json – Video link

- Approach

Sent an HTTP request using requests to fetch the HTML content.

Parsed the HTML using BeautifulSoup to locate target elements.

Extracted and cleaned the data (removing extra spaces and newlines).

Stored the data in multiple CSV and JSON files.

- Challenges Faced

Handling different HTML structures for headings, lists, tables, and forms.

Extracting default values from form fields that may not always have a value.

Grouping table cells correctly when saving in CSV/JSON formats.

Cleaning and formatting text for consistency.

Managing multiple output files without overwriting data.

بطاقة العمل

اسم المستقل
عدد الإعجابات
0
عدد المشاهدات
4
تاريخ الإضافة
المهارات