This project focuses on building a machine learning model to classify SMS messages as either spam or ham (legitimate). The dataset is preprocessed using techniques such as tokenization, vectorization, and dimensionality reduction with Principal Component Analysis (PCA).
A Logistic Regression model is trained, and hyperparameter tuning is performed using GridSearchCV to achieve the best performance. The model is evaluated with metrics including accuracy, precision, recall, and F1-score.
Accuracy before PCA: 96.23%
Accuracy after PCA: 95.42%
Cross-Validation Accuracy: 96.86%
The results demonstrate that the model performs well in detecting spam messages with high reliability, while PCA helps reduce computational complexity with only a slight trade-off in accuracy.
This project highlights the use of machine learning in real-world applications such as spam detection, where fast and accurate text classification is critical for communication security and user experience.