In this project, I developed a complete end-to-end machine learning system for detecting spam emails, starting from data analysis to model evaluation and performance optimization.
? **Project Workflow:**
1. **Data Analysis:**
* Understanding the dataset
* Exploring spam vs. ham distribution
* Extracting most frequent words
2. **Data Preprocessing:**
* Text cleaning (removing punctuation, lowercasing)
* Stopwords removal
* Tokenization
* Text vectorization using:
* CountVectorizer
* TF-IDF
3. **Modeling:**
* Logistic Regression
* Linear Support Vector Classifier (Linear SVC)
* K-Nearest Neighbors (KNN)
4. **Fine Tuning:**
* Hyperparameter optimization using GridSearchCV
5. **Evaluation:**
* Accuracy
* Precision
* Recall
* F1-Score
* ROC Curve & AUC Score
* Test set performance
? Models were compared and evaluated to select the best-performing model based on real test data.
? **Use Cases:**
* Email spam filtering systems
* Fraud detection
* Text classification tasks
?️ **Technologies Used:**
Python – Pandas – NumPy – Scikit-learn – NLP
---
? I focus on building efficient, data-driven solutions to solve real-world problems using machine learning.