Breast Cancer Classification Using Deep Feature Extraction and Machine Learning

Raed Alazaidah; Ghassan Samara; Hamza Abu Asi; Suhaila Abuowaida; Hamza A. Mashagba; Azlan Abd Aziz; Samia Larguech; Samir S. Al-Bawri

doi:10.28991/HIJ-2025-06-04-09

Authors

Raed Alazaidah
razaidah@zu.edu.jo
Department of Data Science and AI, Faculty of Information Technology, Zarqa University, Zarqa 13110, Jordan https://orcid.org/0000-0002-1818-4288
Ghassan Samara Department of Computer Science, Faculty of Information Technology, Zarqa University, Zarqa 13110, Jordan https://orcid.org/0000-0001-8415-0572
Hamza Abu Asi Department of Data Science and AI, Faculty of Information Technology, Zarqa University, Zarqa 13110, Jordan
Suhaila Abuowaida Department of Data Science and Artificial Intelligence, Faculty of Prince Al-Hussein Bin Abdallah II for IT, Al al-Bayt University, Mafraq, Jordan https://orcid.org/0000-0002-2030-4250
Hamza A. Mashagba Faculty of Engineering and Technology, Centre for Wireless Technology (CWT), Multimedia University, Melaka, 75450, Malaysia https://orcid.org/0000-0002-2952-2617
Azlan Abd Aziz Faculty of Engineering and Technology, Centre for Wireless Technology (CWT), Multimedia University, Melaka, 75450, Malaysia https://orcid.org/0000-0003-2255-0831
Samia Larguech Department of Electrical Engineering, College of Engineering, Princess Nourah Bint Abdulrahman University, Riyadh 11671, Saudi Arabia https://orcid.org/0000-0001-5118-1639
Samir S. Al-Bawri Space Science Centre, Institute of Climate Change, Universiti Kebangsaan Malaysia (UKM), Bangi 43600, Malaysia https://orcid.org/0000-0003-2852-575X

Vol. 6 No. 4 (2025): December

Research Articles

Downloads

PDF

Abstract
How to Cite
Metrics
References
License

Early and accurate breast cancer diagnosis remains critical yet challenging in routine practice. This study proposes a simple, reproducible pipeline that combines deep feature extraction from pre-trained CNNs (ResNet50, VGG16, EfficientNet-B0, DenseNet121, MobileNetV2) with classical machine-learning classifiers (logistic regression, SVM, k-NN, decision tree, random forest, gradient boosting, XGBoost, LightGBM, Naïve Bayes, and MLP). Features are computed after standardized preprocessing; class imbalance is addressed with SMOTE when present. We evaluate three image datasets (binary and multiclass) using accuracy, precision, recall/sensitivity, F1, and confusion matrices, and apply paired statistical tests across cross-validation splits. Findings: EfficientNet-B0+MLP and ResNet50+MLP achieve peak accuracies up to 99.6% on high-quality, balanced data, while DenseNet121+MLP with SMOTE attains 97.8% on imbalanced multiclass data. SMOTE yields substantial gains on imbalanced data and negligible effect on balanced sets; decision trees underperform consistently. Novelty/Improvement: Rather than a monolithic end-to-end network, we provide a modular, resource-aware blueprint that (i) disentangles feature extraction from classification, (ii) quantifies when imbalance correction matters, and (iii) reports clinically relevant error types. We further outline explainability with Grad-CAM/SHAP and discuss inference-time trade-offs and real-world workflow integration, offering an interpretable and deployment-friendly alternative to heavier end-to-end models.

[1] Taheri, F., & Rahbar, K. (2025). Improving breast cancer classification in fine-grain ultrasound images through feature discrimination and a transfer learning approach. Biomedical Signal Processing and Control, 106, 107690. doi:10.1016/j.bspc.2025.107690.

[2] Aumente-Maestro, C., Díez, J., & Remeseiro, B. (2025). A multi-task framework for breast cancer segmentation and classification in ultrasound imaging. Computer Methods and Programs in Biomedicine, 260, 108540. doi:10.1016/j.cmpb.2024.108540.

[3] Karlsson, J., Arvidsson, I., Sahlin, F., Åström, K., Overgaard, N. C., Lång, K., & Heyden, A. (2025). Breast cancer classification in point-of-care ultrasound imaging—the impact of training data. Journal of Medical Imaging, 12(01), 014502–014502. doi:10.1117/1.jmi.12.1.014502.

[4] Youssef, D., Atef, H., Gamal, S., El-Azab, J., & Ismail, T. (2025). Early Breast Cancer Prediction Using Thermal Images and Hybrid Feature Extraction-Based System. IEEE Access, 13, 29327–29339. doi:10.1109/ACCESS.2025.3541051.

[5] P, M. D., A, M., Ali, Y., & V, S. (2025). Effective BCDNet-based breast cancer classification model using hybrid deep learning with VGG16-based optimal feature extraction. BMC Medical Imaging, 25(1), 1–23. doi:10.1186/s12880-024-01538-4.

[6] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep residual learning for image recognition. Proceedings of the IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2016-December, 770–778. doi:10.1109/CVPR.2016.90.

[7] Simonyan, K., & Zisserman, A. (2015). Very deep convolutional networks for large-scale image recognition. 3rd International Conference on Learning Representations, ICLR 2015 - Conference Track Proceedings, 1-14.

[8] Tan, M., & Le, Q. V. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. In 36th International Conference on Machine Learning, ICML 2019, 10691–10700.

[9] Huang, G., Liu, Z., Van Der Maaten, L., & Weinberger, K. Q. (2017). Densely connected convolutional networks. Proceedings - 30th IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2017, January, 2261-2269. doi:10.1109/CVPR.2017.243.

[10] Salmi, M., Atif, D., Oliva, D., Abraham, A., & Ventura, S. (2024). Handling imbalanced medical datasets: review of a decade of research. Artificial Intelligence Review, 57(10), 273. doi:10.1007/s10462-024-10884-2.

[11] Chawla, N. V., Bowyer, K. W., Hall, L. O., & Kegelmeyer, W. P. (2002). SMOTE: synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357.

[12] Jabeen, R., Alketbi, S., Mohammed, A., Mehreen, F., Yousaf, J., Ghazal, M., & Hassan, T. (2025). A Review of Deep Learning Systems for Screening Skin Diseases. Proceedings - 2025 12th International Conference on Future Internet of Things and Cloud, FiCloud 2025, 438–444. doi:10.1109/FiCloud66139.2025.00067.

[13] Kumar, V., Kumar, R. K., & Singh, S. K. (2025). Evaluation and Enhancement of Standard Classifier Performance by Resolving Class Imbalance Issue Using Smote-Variants over Multiple Medical Datasets. SN Computer Science, 6(3), 1-30. doi:10.1007/s42979-025-03775-y.

[14] Benkadja, A., Ben Ayed, A., Biskri, I., & Ghazzali, N. (2022). Statistical Profiling of Hybrid CNN-SVM Effectiveness. International Conference on the Statistical Analysis of Textual Data, 15-27. doi:10.1007/978-3-031-55917-4_2.

[15] Sudiana, D., Putri, S. H., Kushardono, D., Prabuwono, A. S., Sri Sumantyo, J. T., & Rizkinia, M. (2025). CNN-random forest hybrid method for phenology-based paddy rice mapping using Sentinel-2 and Landsat-8 satellite images. Computers, 14(8), 336. doi:10.3390/computers14080336.

[16] Wang, Y., Sun, F., Lu, M., & Yao, A. (2020). Learning deep multimodal feature representation with asymmetric multi-layer fusion. Proceedings of the 28th ACM International Conference on Multimedia, 3902-3910. doi:10.1145/3394171.3413621.

[17] Jackson, J., Jackson, L. E., Ukwuoma, C. C., Kissi, M. D., Oluwasanmi, A., & Zhiguang, Q. (2025). A patch-based deep learning framework with 5-B network for breast cancer multi-classification using histopathological images. Engineering Applications of Artificial Intelligence, 148, 110439. doi:10.1016/j.engappai.2025.110439.

[18] Mannarsamy, V., Mahalingam, P., Kalivarathan, T., Amutha, K., Paulraj, R. K., & Ramasamy, S. (2025). Sift-BCD: SIFT-CNN integrated machine learning-based breast cancer detection. Biomedical Signal Processing and Control, 106, 107686. doi:10.1016/j.bspc.2025.107686.

[19] Krizhevsky, A., Sutskever, I., & Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6), 84-90. doi:10.1145/3065386.

[20] Wang, K., Gopaluni, R. B., Chen, J., & Song, Z. (2020). Deep Learning of Complex Batch Process Data and Its Application on Quality Prediction. IEEE Transactions on Industrial Informatics, 16(12), 7233–7242. doi:10.1109/TII.2018.2880968.

[21] Nguyen, T. T., Trahay, F., Domke, J., Drozd, A., Vatai, E., Liao, J., Wahib, M., & Gerofi, B. (2022). Why Globally Re-shuffle? Revisiting Data Shuffling in Large Scale Deep Learning. Proceedings - 2022 IEEE 36th International Parallel and Distributed Processing Symposium, IPDPS 2022, 1085–1096. doi:10.1109/IPDPS53621.2022.00109.

[22] Nair, V., & Hinton, G. E. (2010). Rectified linear units improve Restricted Boltzmann machines. ICML 2010 - Proceedings, 27th International Conference on Machine Learning, 807–814.

[23] Khafajeh, H. (2024). Cyberbullying Detection in Social Networks Using Deep Learning. International Arab Journal of Information Technology, 21(6), 1054–1063. doi:10.34028/iajit/21/6/9.

[24] Alazaidah, R., Samara, G., Almatarneh, S., Hassan, M., Aljaidi, M., & Mansur, H. (2023). Multi-Label Classification Based on Associations. Applied Sciences (Switzerland), 13(8), 5081. doi:10.3390/app13085081.

[25] Alazaidah, R., Samara, G., Aljaidi, M., Haj Qasem, M., Alsarhan, A., & Alshammari, M. (2024). Potential of Machine Learning for Predicting Sleep Disorders: A Comprehensive Analysis of Regression and Classification Models. Diagnostics, 14(1), 27. doi:10.3390/diagnostics14010027.

[26] He, D., Zhou, X., Guan, W., Zhang, L., Zhang, X., Xu, S., ... & Xie, W. (2025). Boosting Pathology Foundation Models via Few-shot Prompt-tuning for Rare Cancer Subtyping. arXiv Preprint, arXiv:2508.15904. doi:10.48550/arXiv.2508.15904.

Acceptance Rate:	27%
Review Speed:	61 days
Issue Per Year:	4
Number of Volumes:	5
Number of Issues:	19
Number of Articles:	193
Number of Reviewers:	372
Number of Contributors:	530
Contributing Countries:	63
No. of Scopus Citations:	1289
No. of WoS Citations:	1187
No. of Google Citations:	1470
Google h-index:	21
Google i10-index:	45
Abstract Views:	123,086
PDF Download:	103,923

Breast Cancer Classification Using Deep Feature Extraction and Machine Learning

Authors

Downloads

Downloads

Login

submission

Publisher & Affiliated Societies

Indexing & Abstracting

SidebarMenu

Journal Imprint

Most Cited Articles

Towards Bayesian Quantification of Permeability in Micro-scale Porous Structures – The Database of Micro Networks

Physicochemical and Microstructural Characterization of Klias Peat, Lumadan POFA, and GGBFS for Geopolymer Based Soil Stabilization

Seismic Upgradation of RC Beams Strengthened with Externally Bonded Spent Catalyst Based Ferrocement Laminates

Temporal Trends of Rainfall and Temperature over Two Sub-Divisions of Western Ghats

IndexedBy

Indexed In

twitter

Social Media

Analytics

Analytics

Information

Address

Contact Info:

Breast Cancer Classification Using Deep Feature Extraction and Machine Learning

Authors

Downloads

Downloads

Login

submission

Publisher & Affiliated Societies

Indexing & Abstracting

SidebarMenu

Journal Imprint

Journal Imprint

Journal Metrics

Most Cited Articles

Towards Bayesian Quantification of Permeability in Micro-scale Porous Structures – The Database of Micro Networks

Physicochemical and Microstructural Characterization of Klias Peat, Lumadan POFA, and GGBFS for Geopolymer Based Soil Stabilization

Seismic Upgradation of RC Beams Strengthened with Externally Bonded Spent Catalyst Based Ferrocement Laminates

Temporal Trends of Rainfall and Temperature over Two Sub-Divisions of Western Ghats

IndexedBy

Indexed In

twitter

Social Media

Analytics

Analytics

Information