تفاصيل العمل

I worked on an end-to-end NLP project for Sentiment Analysis of Arabizi text (Arabic written in Latin characters, commonly used in social media).

The project is built using Transformer-based models and focuses on handling the challenges of noisy, informal, and mixed-language text.

? Project Pipeline:

We started by collecting and exploring a suitable dataset for sentiment analysis.

Since most available data is in Arabic, we used LangChain + Gemini model to convert and normalize Arabic text into Arabizi format, making the dataset consistent and usable for training.

A custom tokenizer was trained/optimized to properly understand Arabizi structure, including slang, abbreviations, and mixed character patterns.

We then fine-tuned a Transformer-based model for sentiment classification (Positive / Negative / Neutral).

The model achieved +87% accuracy, showing strong performance on noisy real-world text.

? Handling Mixed / Complex Sentences:

For sentences that contain mixed sentiments or multiple parts, we designed a scoring strategy where:

The sentence is split into meaningful segments

Each segment gets its own sentiment score

The final sentiment is calculated based on the highest weighted score, giving a more accurate overall prediction

? Technologies Used:

Transformers, LangChain, Gemini API, Python, NLP preprocessing, Fine-tuning pipelines

? Outcome:

A robust sentiment analysis system capable of understanding Arabizi text with high accuracy, even in challenging real-world scenarios involving slang, noise, and mixed expressions.

بطاقة العمل

اسم المستقل
عدد الإعجابات
0
تاريخ الإضافة
المهارات