Breast Cancer Classification Using Deep Feature Extraction and Machine Learning

Pre-Trained Model Machine Learning Image Processing Imbalanced Data Feature Extraction Quantitative Evaluation

Authors

Downloads

Early and accurate breast cancer diagnosis remains critical yet challenging in routine practice. This study proposes a simple, reproducible pipeline that combines deep feature extraction from pre-trained CNNs (ResNet50, VGG16, EfficientNet-B0, DenseNet121, MobileNetV2) with classical machine-learning classifiers (logistic regression, SVM, k-NN, decision tree, random forest, gradient boosting, XGBoost, LightGBM, Naïve Bayes, and MLP). Features are computed after standardized preprocessing; class imbalance is addressed with SMOTE when present. We evaluate three image datasets (binary and multiclass) using accuracy, precision, recall/sensitivity, F1, and confusion matrices, and apply paired statistical tests across cross-validation splits. Findings: EfficientNet-B0+MLP and ResNet50+MLP achieve peak accuracies up to 99.6% on high-quality, balanced data, while DenseNet121+MLP with SMOTE attains 97.8% on imbalanced multiclass data. SMOTE yields substantial gains on imbalanced data and negligible effect on balanced sets; decision trees underperform consistently. Novelty/Improvement: Rather than a monolithic end-to-end network, we provide a modular, resource-aware blueprint that (i) disentangles feature extraction from classification, (ii) quantifies when imbalance correction matters, and (iii) reports clinically relevant error types. We further outline explainability with Grad-CAM/SHAP and discuss inference-time trade-offs and real-world workflow integration, offering an interpretable and deployment-friendly alternative to heavier end-to-end models.