# Sentiment Analysis with NLP and Oversampling
This project applies Natural Language Processing (NLP) techniques to classify text data into positive and negative sentiments. It also solves the class imbalance problem using oversampling methods such as SMOTE, ADASYN, and RandomOverSampler.
## Objective
To build a robust sentiment analysis pipeline that performs well even when the dataset is imbalanced.
## Workflow
1. Data Loading & Cleaning
2. Text Preprocessing using:
- Lowercasing
- Removing stopwords
- Tokenization
- Lemmatization
3. Text Vectorization (TF-IDF)
4. Train/Test Split
5. Oversampling the minority class
6. Model Training:
- Logistic Regression
- Random Forest
- XGBoost
7. Evaluation (F1-score, Recall, Precision, Confusion Matrix)
## ? Technologies
- Python
- NLTK
- Scikit-learn
- imbalanced-learn
- XGBoost
## Oversampling Techniques
- SMOTE
- ADASYN
- RandomOverSampler
## Results
The project shows a significant improvement in the recall and F1-score of the minority class using oversampling. Check the notebook for full evaluation metrics.