whether a Windows machine would be infected with malware, helping to prevent future attacks.
The challenge was based on a Kaggle competition and involved analyzing a large dataset provided by Microsoft. The dataset included telemetry data from millions of machines, describing their configuration, operating systems, and other properties.
We worked on handling the large and imbalanced dataset, cleaning it, and performing exploratory data analysis to extract meaningful insights.
We used the XGBoost model for prediction, achieving a high accuracy , with further evaluations including the confusion matrix, precision, recall, and F1-score.
To make the model user-friendly, we deployed it using Streamlit, creating a simple web interface for real-time predictions on whether a machine might be infected with malware.
This project was an amazing opportunity to develop my skills in data manipulation, machine learning, and model deployment. I’m grateful to my team for the collaboration, and to Eng. Mohamed Ibrahim for his guidance throughout the process