Breast Cancer Classification Using Deep Feature Extraction and Machine Learning
Downloads
Early and accurate breast cancer diagnosis remains critical yet challenging in routine practice. This study proposes a simple, reproducible pipeline that combines deep feature extraction from pre-trained CNNs (ResNet50, VGG16, EfficientNet-B0, DenseNet121, MobileNetV2) with classical machine-learning classifiers (logistic regression, SVM, k-NN, decision tree, random forest, gradient boosting, XGBoost, LightGBM, Naïve Bayes, and MLP). Features are computed after standardized preprocessing; class imbalance is addressed with SMOTE when present. We evaluate three image datasets (binary and multiclass) using accuracy, precision, recall/sensitivity, F1, and confusion matrices, and apply paired statistical tests across cross-validation splits. Findings: EfficientNet-B0+MLP and ResNet50+MLP achieve peak accuracies up to 99.6% on high-quality, balanced data, while DenseNet121+MLP with SMOTE attains 97.8% on imbalanced multiclass data. SMOTE yields substantial gains on imbalanced data and negligible effect on balanced sets; decision trees underperform consistently. Novelty/Improvement: Rather than a monolithic end-to-end network, we provide a modular, resource-aware blueprint that (i) disentangles feature extraction from classification, (ii) quantifies when imbalance correction matters, and (iii) reports clinically relevant error types. We further outline explainability with Grad-CAM/SHAP and discuss inference-time trade-offs and real-world workflow integration, offering an interpretable and deployment-friendly alternative to heavier end-to-end models.
Downloads
[1] Taheri, F., & Rahbar, K. (2025). Improving breast cancer classification in fine-grain ultrasound images through feature discrimination and a transfer learning approach. Biomedical Signal Processing and Control, 106, 107690. doi:10.1016/j.bspc.2025.107690.
[2] Aumente-Maestro, C., Díez, J., & Remeseiro, B. (2025). A multi-task framework for breast cancer segmentation and classification in ultrasound imaging. Computer Methods and Programs in Biomedicine, 260, 108540. doi:10.1016/j.cmpb.2024.108540.
[3] Karlsson, J., Arvidsson, I., Sahlin, F., Åström, K., Overgaard, N. C., Lång, K., & Heyden, A. (2025). Breast cancer classification in point-of-care ultrasound imaging—the impact of training data. Journal of Medical Imaging, 12(01), 014502–014502. doi:10.1117/1.jmi.12.1.014502.
[4] Youssef, D., Atef, H., Gamal, S., El-Azab, J., & Ismail, T. (2025). Early Breast Cancer Prediction Using Thermal Images and Hybrid Feature Extraction-Based System. IEEE Access, 13, 29327–29339. doi:10.1109/ACCESS.2025.3541051.
[5] P, M. D., A, M., Ali, Y., & V, S. (2025). Effective BCDNet-based breast cancer classification model using hybrid deep learning with VGG16-based optimal feature extraction. BMC Medical Imaging, 25(1), 1–23. doi:10.1186/s12880-024-01538-4.
[6] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 770–778. doi:10.1109/CVPR.2016.90.
[7] Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, 1-14.
[8] Tan, M., & Le, Q. V. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. In 36th International Conference on Machine Learning, ICML 2019, 10691–10700.
[9] Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, January, 2261-2269. doi:10.1109/CVPR.2017.243.
[10] Salmi, M., Atif, D., Oliva, D., Abraham, A., & Ventura, S. (2024). Handling imbalanced medical datasets: review of a decade of research. Artificial Intelligence Review, 57(10), 273. doi:10.1007/s10462-024-10884-2.
[11] Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357.
[12] Jabeen, R., Alketbi, S., Mohammed, A., Mehreen, F., Yousaf, J., Ghazal, M., & Hassan, T. (2025). A Review of Deep Learning Systems for Screening Skin Diseases. Proceedings - 2025 12th International Conference on Future Internet of Things and Cloud, FiCloud 2025, 438–444. doi:10.1109/FiCloud66139.2025.00067.
[13] Kumar, V., Kumar, R. K., & Singh, S. K. (2025). Evaluation and Enhancement of Standard Classifier Performance by Resolving Class Imbalance Issue Using Smote-Variants over Multiple Medical Datasets. SN Computer Science, 6(3), 1-30. doi:10.1007/s42979-025-03775-y.
[14] Benkadja, A., Ben Ayed, A., Biskri, I., & Ghazzali, N. (2022). Statistical Profiling of Hybrid CNN-SVM Effectiveness. International Conference on the Statistical Analysis of Textual Data, 15-27. doi:10.1007/978-3-031-55917-4_2.
[15] Sudiana, D., Putri, S. H., Kushardono, D., Prabuwono, A. S., Sri Sumantyo, J. T., & Rizkinia, M. (2025). CNN-random forest hybrid method for phenology-based paddy rice mapping using Sentinel-2 and Landsat-8 satellite images. Computers, 14(8), 336. doi:10.3390/computers14080336.
[16] Wang, Y., Sun, F., Lu, M., & Yao, A. (2020). Learning deep multimodal feature representation with asymmetric multi-layer fusion. Proceedings of the 28th ACM International Conference on Multimedia, 3902-3910. doi:10.1145/3394171.3413621.
[17] Jackson, J., Jackson, L. E., Ukwuoma, C. C., Kissi, M. D., Oluwasanmi, A., & Zhiguang, Q. (2025). A patch-based deep learning framework with 5-B network for breast cancer multi-classification using histopathological images. Engineering Applications of Artificial Intelligence, 148, 110439. doi:10.1016/j.engappai.2025.110439.
[18] Mannarsamy, V., Mahalingam, P., Kalivarathan, T., Amutha, K., Paulraj, R. K., & Ramasamy, S. (2025). Sift-BCD: SIFT-CNN integrated machine learning-based breast cancer detection. Biomedical Signal Processing and Control, 106, 107686. doi:10.1016/j.bspc.2025.107686.
[19] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90. doi:10.1145/3065386.
[20] Wang, K., Gopaluni, R. B., Chen, J., & Song, Z. (2020). Deep Learning of Complex Batch Process Data and Its Application on Quality Prediction. IEEE Transactions on Industrial Informatics, 16(12), 7233–7242. doi:10.1109/TII.2018.2880968.
[21] Nguyen, T. T., Trahay, F., Domke, J., Drozd, A., Vatai, E., Liao, J., Wahib, M., & Gerofi, B. (2022). Why Globally Re-shuffle? Revisiting Data Shuffling in Large Scale Deep Learning. Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022, 1085–1096. doi:10.1109/IPDPS53621.2022.00109.
[22] Nair, V., & Hinton, G. E. (2010). Rectified linear units improve Restricted Boltzmann machines. ICML 2010 - Proceedings, 27th International Conference on Machine Learning, 807–814.
[23] Khafajeh, H. (2024). Cyberbullying Detection in Social Networks Using Deep Learning. International Arab Journal of Information Technology, 21(6), 1054–1063. doi:10.34028/iajit/21/6/9.
[24] Alazaidah, R., Samara, G., Almatarneh, S., Hassan, M., Aljaidi, M., & Mansur, H. (2023). Multi-Label Classification Based on Associations. Applied Sciences (Switzerland), 13(8), 5081. doi:10.3390/app13085081.
[25] Alazaidah, R., Samara, G., Aljaidi, M., Haj Qasem, M., Alsarhan, A., & Alshammari, M. (2024). Potential of Machine Learning for Predicting Sleep Disorders: A Comprehensive Analysis of Regression and Classification Models. Diagnostics, 14(1), 27. doi:10.3390/diagnostics14010027.
[26] He, D., Zhou, X., Guan, W., Zhang, L., Zhang, X., Xu, S., ... & Xie, W. (2025). Boosting Pathology Foundation Models via Few-shot Prompt-tuning for Rare Cancer Subtyping. arXiv Preprint, arXiv:2508.15904. doi:10.48550/arXiv.2508.15904.
- This work (including HTML and PDF Files) is licensed under a Creative Commons Attribution 4.0 International License.






















