I am pleased to share with you the Breast Cancer Classification Project using Machine Learning, where a set of models and techniques were used to analyze medical data and classify the tumor into malignant or benign based on a set of clinical characteristics.
Models used:
Several machine learning models were applied and compared to ensure the highest possible accuracy, including:
Logistic Regression: A simple and fast model based on linearity in class separation.
Support Vector Machine (SVC): Tested with multiple types of kernels (linear, RBF, and polynomial) to improve performance.
Random Forest: Used to build multiple decision trees and combine their results to reduce errors.
Gradual Boosting: A robust model that gradually improves performance.
Kest Neighbors (KNN): A model based on the geographical proximity of samples to classify them.
XGBoost: An advanced model that enhances performance quickly and accurately, especially with big data.
Libraries used:
Several libraries were relied upon to develop the project and achieve its goals, including:
NumPy and Pandas: For data analysis and processing.
Matplotlib and Seaborn: To create graphs and visualize data.
Skit-learn: To apply models, measure data, and use grid search techniques such as GridSearchCV and RandomizedSearchCV.
Imbalanced-learn: To handle data imbalance using SMOTE.
XGBoost: To apply advanced boosting in classification.
Data Analysis and Graphical Visualizations:
Exploratory Data Analysis (EDA) was used to clarify important relationships and features. The analyses included:
Distribution plotting using Seaborn to analyze the distribution of independent features.
Correlation matrix to identify relationships between features