Project Title: Question Type Classification using Machine Learning (SVM)
Description:
This project focuses on building a Natural Language Processing (NLP) model to automatically classify questions into their respective types such as WHAT, WHO, WHEN, WHERE, WHY, HOW, WHICH, and OTHER.
The system uses text preprocessing techniques combined with feature extraction and a machine learning model to achieve accurate classification results.
⚙️ Key Features:
? Automatic Labeling:
Questions are labeled based on their starting word using regex patterns.
? Text Preprocessing:
Lowercasing
إزالة الرموز (punctuation)
إزالة الـ stopwords
Stemming باستخدام PorterStemmer
Lemmatization باستخدام WordNetLemmatizer
? Feature Extraction:
استخدام TF-IDF Vectorizer
دعم Unigrams & Bigrams
? Model Training:
استخدام Support Vector Machine (SVM) عبر LinearSVC
تدريب الموديل على بيانات train
? Evaluation Metrics:
Accuracy
Precision
Recall
F1 Score
Classification Report
? Visualization:
رسم Bar Chart لعرض أداء الموديل
? Model Saving:
حفظ:
الموديل
الـ Vectorizer
الـ Label Encoder
باستخدام joblib
? Output:
ملف CSV يحتوي على:
البيانات الأصلية
نوع السؤال المتوقع (Predicted Label)
? Technologies Used:
Python
Pandas
NLTK
Scikit-learn
Matplotlib
Joblib
? Project Goal:
To build an efficient and automated system capable of understanding and classifying question types, which can be useful in:
Chatbots ?
Question Answering Systems
Search Engines
? Future Improvements:
استخدام Deep Learning (زي LSTM أو BERT)
تحسين الـ preprocessing (زي التعامل مع السياق)
زيادة حجم الداتا لتحسين الأداء