Classification

تفاصيل العمل

Breast Cancer Classification with Logistic Regression

I recently completed a project using the Breast Cancer Wisconsin Diagnostic Dataset to build a predictive model for classifying tumors as Benign (B) or Malignant (M).

Key Steps:

Dataset: 569 samples, 30 numerical features describing tumor cell nuclei.

Preprocessing:

Encoded diagnosis (M=1, B=0)

Removed irrelevant ID column

Standardized all features with StandardScaler

Train-Test Split: 80% train / 20% test (stratified to maintain class balance).

? Model:

Logistic Regression (with class_weight="balanced" to handle class imbalance).

Results:

Accuracy: 97.4%

AUC (ROC Curve): 0.995

Confusion Matrix:

True Negatives: 71

False Positives: 1

False Negatives: 2

True Positives: 40

Insights:

Both training (98.7%) and testing (97.4%) accuracies were high and close → no overfitting.

The model performs very well, but False Negatives (missed malignant cases) remain the most critical error type in medical diagnosis, as they could delay treatment.

False Positives, while less harmful, may lead to unnecessary stress and additional tests.

This project demonstrates how even simple models like Logistic Regression can deliver powerful, reliable results in healthcare-related tasks.

بطاقة العمل

اسم المستقل

Samir N.

عدد الإعجابات

عدد المشاهدات

تاريخ الإضافة

25/08/2025

المهارات

Classification

تفاصيل العمل

بطاقة العمل

روابط

تابع مستقل على

وسائل الدفع المتاحة