Sentiment Analysis Using NLP and Feature Selection on Customer Feedback Data

تفاصيل العمل

Customer feedback is crucial for shaping business strategies and improving products/services. This analysis uses the Kaggle Customer Feedback Dataset, which includes diverse customer sentiments.

We apply Natural Language Processing (NLP) and Machine Learning (ML) techniques to perform sentiment analysis, using Python's nltk and scikit-learn libraries. Our focus is on efficient text preprocessing, feature selection, and classification model evaluation.

1. Data Preprocessing

Data Cleaning: Handle missing values, remove duplicates.

Text Processing: Lowercase conversion, punctuation removal, stopword removal, stemming (Porter Stemmer).

Tokenization: Count words and characters for exploratory analysis.

2. Feature Selection Techniques

Filter Methods: Statistical measures (e.g., Chi-square).

Wrapper Methods: Recursive Feature Elimination (RFE).

Dimensionality Reduction: PCA to retain key variance features.

3. Sentiment Labeling

Used VADER from NLTK to assign sentiment scores:

Positive, Neutral, Negative (based on compound score thresholds).

Model Training and Evaluation

Pipeline

Vectorization: TF-IDF.

Models Used: Multinomial Naive Bayes, Random Forest, SVM.

Metrics Evaluated: Accuracy, Precision, Recall, F1 Score.

Experimental Setup

Data split into training and testing sets.

Applied preprocessing and vectorization before classification.

Compared baseline model vs. Chi-Square vs. PCA approaches.

ملفات مرفقة

بطاقة العمل

اسم المستقل
عدد الإعجابات
0
عدد المشاهدات
18
تاريخ الإضافة
تاريخ الإنجاز