news classification nlp

تفاصيل العمل

Overview

Built an NLP classifier that categorizes news articles into 31 distinct categories using logistic regression with CountVectorizer.

Data Processing

- Dataset: Sourced from Kaggle (news headlines with categories)

- Preprocessing Pipeline:

- Text cleaning (special characters, lowercase)

- Stopword removal (English)

- Lemmatization (WordNet)

- Label encoding (31 categories)

Model Architecture

- Vectorization: CountVectorizer (unigrams)

- Classifier: Logistic Regression

- L2 regularization (C=5.0)

- lbfgs solver

- 5000 max iterations

Performance

Achieved classification through:

- Stratified train-test split (80-20)

- Comprehensive text preprocessing

- Hyperparameter tuning

Deployment

Built a Streamlit web app featuring:

- Clean, responsive UI with custom CSS

- Real-time category prediction

- Error handling and loading states

- Category mapping system

Technical Stack

- Python (Pandas, NLTK, Scikit-learn)

- Streamlit for web interface

- Joblib for model persistence

The project demonstrates end-to-end NLP pipeline development from preprocessing to deployment.

This version is:

1. Professional and technical

2. Concise (easy to read on LinkedIn)

3. Highlights key achievements

4. Structured for readability

5. Focused on measurable outcomes

معاينة

بطاقة العمل

اسم المستقل

Adham N.

عدد الإعجابات

تاريخ الإضافة

06/02/2025

تاريخ الإنجاز

25/03/2025

المهارات

news classification nlp

تفاصيل العمل

بطاقة العمل

روابط

تابع مستقل على

وسائل الدفع المتاحة