A Novel Hybrid ViT-CNN Approach for Pneumonia and Lung Opacity Detection in X-Ray Images
Downloads
In order to automatically classify chest X-ray pictures into three diagnostic categories—Normal, Lung Opacity, and Viral Pneumonia—this study presents a novel hybrid deep learning architecture that combines the Vision Transformer (ViT) with a Convolutional Neural Network (CNN). The suggested model successfully addresses the drawbacks of single-architecture systems by fusing the ResNet-18 CNN's expertise in local texture analysis with the ViT's global feature representation capability. According to experimental assessments, the hybrid ViT-CNN architecture outperforms the state-of-the-art methods, achieving 94.2% classification accuracy with precision, recall, and F1-scores continuously above 94% for the majority of categories. Even in complicated situations where traditional methods usually falter, like distinguishing between lung opacity and normal patients, the model exhibits strong performance. Additionally, it performs well in discrimination, with AUC values above 0.95 in every class. The system is ideal for real-time clinical deployment because it maintains a high computational efficiency, generating conclusions in about 0.0012 seconds per image. Grad-CAM visualization makes it evident which areas of the image are important for making diagnostic decisions, hence validating the model's interpretability. All things considered, this work establishes a new benchmark for chest X-ray classification performance and offers a useful foundation for automated diagnostic assistance in resource-constrained healthcare settings.
Downloads
[1] Wang, X., Peng, Y., Lu, L., Lu, Z., Bagheri, M., & Summers, R. M. (2017). ChestX-Ray8: Hospital-Scale Chest X-Ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 3462–3471. doi:10.1109/cvpr.2017.369.
[2] Widjaja, A. E., & Toer, G. A. (2026). Clustering Digital Governance Adoption Patterns in the Metaverse Using K-Means and DBSCAN Algorithms. International Journal Research on Metaverse, 3(1), 1–13. doi:10.47738/ijrm.v3i1.42.
[3] Bradley, S. H., Grice, A., Neal, R. D., Abraham, S., Rodriguez Lopez, R., Shinkins, B., Callister, M. E. J., & Hamilton, W. T. (2019). Sensitivity of chest X-ray for detecting lung cancer in people presenting with symptoms: A systematic review. British Journal of General Practice, 69(689), E827–E835. doi:10.3399/bjgp19X706853.
[4] Maidin, S. S., Hemalatha, M., & Sun, J. (2026). A Hybrid Ensemble Framework Combining Transformer Networks, CNN-LSTM, and Prophet for Multi-Horizon Bitcoin Price Prediction Using 1-Minute Time Series Data. Journal of Current Research in Blockchain, 3(1), 46-63. doi:10.47738/jcrb.v3i1.57.
[5] Wielpütz, M. O., Heußel, C. P., Herth, F. J. F., & Kauczor, H.-U. (2014). Radiological diagnosis in lung disease: factoring treatment options into the choice of diagnostic modality. Deutsches Arzteblatt International, 111(11), 181–187. doi:10.3238/arztebl.2014.0181.
[6] Kim, S., Rim, B., Choi, S., Lee, A., Min, S., & Hong, M. (2022). Deep Learning in Multi-Class Lung Diseases’ Classification on Chest X-ray Images. Diagnostics, 12(4), 915. doi:10.3390/diagnostics12040915.
[7] Javadi, M. (2025). Sentiment Analysis of User Reviews on Cryptocurrency Trading Platforms Using Pre-Trained Language Models for Evaluating User Satisfaction. Journal of Digital Market and Digital Currency, 2(4), 408–433. doi:10.47738/jdmdc.v2i4.46.
[8] Genc, S., Akpinar, K. N., & Karagol, S. (2020). Automated Abnormality Classification of Chest Radiographs using MobileNetV2. HORA 2020 - 2nd International Congress on Human-Computer Interaction, Optimization and Robotic Applications, Proceedings, 1–4. doi:10.1109/HORA49412.2020.9152607.
[9] Rahardja, U. (2025). Clustering AI Job Roles Using PCA and K-Means Based on Skill Profiles and Automation Risk. Artificial Intelligence in Learning, 1(4), 315–328. doi:10.63913/ail.v1i4.44.
[10] Irtaza, M., Ali, A., Gulzar, M., & Wali, A. (2024). Multi-label classification of lung diseases using deep learning. IEEE Access, 12, 124062-124080. doi:10.1109/ACCESS.2024.3454537.
[11] Duncan, S. F., McConnachie, A., Blackwood, J., Stobo, D. B., Maclay, J. D., Wu, O., Germeni, E., Robert, D., Bilgili, B., Kumar, S., Hall, M., & Lowe, D. J. (2024). Radiograph accelerated detection and identification of cancer in the lung (RADICAL): a mixed methods study to assess the clinical effectiveness and acceptability of Qure.ai artificial intelligence software to prioritise chest X-ray (CXR) interpretation. BMJ Open, 14(9), e081062. doi:10.1136/bmjopen-2023-081062.
[12] Chantanasut, S. (2025). BERT-Based Emotion and Sarcasm-Aware Classification of Harmful Online Content for Cyber Law Enforcement. Journal of Cyber Law, 1(4), 300–313. doi:10.63913/jcl.v1i4.73.
[13] Sanida, M. V., Sanida, T., Sideris, A., & Dasygenis, M. (2024). An Advanced Deep Learning Framework for Multi-Class Diagnosis from Chest X-ray Images. J, 7(1), 48–71. doi:10.3390/j7010003.
[14] Sugianto, D. (2025). Classifying Vehicle Categories Based on Technical Specifications Using Random Forest and SMOTE for Data Augmentation. International Journal for Applied Information Management, 5(4), 179–191. doi:10.47738/ijaim.v5i4.113.
[15] Fan, R., & Bu, S. (2022). Transfer-Learning-Based Approach for the Diagnosis of Lung Diseases from Chest X-ray Images. Entropy, 24(3), 313. doi:10.3390/e24030313.
[16] Vachmanus, S., Noraset, T., Piyanonpong, W., Rattananukrom, T., & Tuarob, S. (2023). DeepMetaForge: A deep vision-transformer metadata-fusion network for automatic skin lesion classification. IEEE access, 11, 145467-145484. doi:10.1109/ACCESS.2023.3345225.
[17] Furqan, M., Katuk, N., & Hartama, D. (2026). Multiclass Skin Lesion Classification Algorithm using Attention-Based Vision Transformer with Metadata Fusion. Journal of Applied Data Sciences, 7(1), 203–217. doi:10.47738/jads.v7i1.1017.
[18] Faisal, M., Darmawan, J. T., Bachroin, N., Avian, C., Leu, J. S., & Tsai, C. T. (2023). CheXViT: CheXNet and Vision Transformer to Multi-Label Chest X-Ray Image Classification. 2023 IEEE International Symposium on Medical Measurements and Applications, MeMeA 2023 - Conference Proceedings. doi:10.1109/MeMeA57477.2023.10171855.
[19] Pantelaios, D., Theofilou, P.-A., Tzouveli, P., & Kollias, S. (2024). Hybrid CNN-ViT Models for Medical Image Classification. 2024 IEEE International Symposium on Biomedical Imaging (ISBI), 1–4. doi:10.1109/isbi56570.2024.10635205.
[20] Hadhoud, Y., Mekhaznia, T., Bennour, A., Amroune, M., Kurdi, N. A., Aborujilah, A. H., & Al-Sarem, M. (2024). From Binary to Multi-Class Classification: A Two-Step Hybrid CNN-ViT Model for Chest Disease Classification Based on X-Ray Images. Diagnostics, 14(23), 2754. doi:10.3390/diagnostics14232754.
[21] Mustapha, B., Zhou, Y., Shan, C., & Xiao, Z. (2025). Enhanced pneumonia detection in chest X-rays using hybrid convolutional and vision transformer networks. Current Medical Imaging, 21(1), 1-23. doi:10.2174/0115734056326685250101113959
[22] Yulvina, R., Putra, S. A., Rizkinia, M., Pujitresnani, A., Tenda, E. D., Yunus, R. E., Djumaryo, D. H., Yusuf, P. A., & Valindria, V. (2024). Hybrid Vision Transformer and Convolutional Neural Network for Multi-Class and Multi-Label Classification of Tuberculosis Anomalies on Chest X-Ray. Computers, 13(12), 343. doi:10.3390/computers13120343.
[23] Yang, Y., Zhang, L., Ren, L., & Wang, X. (2023). MMViT-Seg: A lightweight transformer and CNN fusion network for COVID-19 segmentation. Computer Methods and Programs in Biomedicine, 230, 107348. doi:10.1016/j.cmpb.2023.107348.
[24] Rajpurkar, P., Irvin, J., Zhu, K., Yang, B., Mehta, H., Duan, T., Ding, D., Bagul, A., Langlotz, C., Shpanskaya, K., Lungren, M. P., & Ng, A. Y. (2017). CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning. arXiv Preprint, arXiv:1711.05225. doi:10.48550/arXiv.1711.05225.
[25] Jain, A., Bhardwaj, A., Murali, K., & Surani, I. (2024). A comparative study of CNN, ResNet, and vision transformers for multi-classification of chest diseases. arXiv Preprint, arXiv:2406.00237. doi:10.48550/arXiv.2406.00237.
[26] Jaeger, S., Candemir, S., Antani, S., Wáng, Y.-X. J., Lu, P.-X., & Thoma, G. (2014). Two public chest X-ray datasets for computer-aided screening of pulmonary diseases. Quantitative Imaging in Medicine and Surgery, 4(6), 475–477. doi:10.3978/j.issn.2223-4292.2014.11.20.
[27] Bustos, A., Pertusa, A., Salinas, J. M., & de la Iglesia-Vayá, M. (2020). PadChest: A large chest x-ray image dataset with multi-label annotated reports. Medical Image Analysis, 66, 101797. doi:10.1016/j.media.2020.101797.
[28] Johnson, A. E. W., Pollard, T. J., Greenbaum, N. R., Lungren, M. P., Deng, C., Peng, Y., Lu, Z., Mark, R. G., Berkowitz, S. J., & Horng, S. (2019). MIMIC-CXR-JPG, a large publicly available database of labeled chest radiographs. doi:10.48550/arXiv.1901.07042.
[29] Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., & Fei-Fei, L. (2009). ImageNet: A Large-Scale Hierarchical Image Database. 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2009, 248–255. doi:10.1109/CVPR.2009.5206848.
[30] Pérez-García, F., Sparks, R., & Ourselin, S. (2021). TorchIO: A Python library for efficient loading, preprocessing, augmentation and patch-based sampling of medical images in deep learning. Computer Methods and Programs in Biomedicine, 208, 106236. doi:10.1016/j.cmpb.2021.106236.
[31] Kim, S. (2026). Automated Identification of Gait Anomalies Using Deep Autoencoder and Isolation Forest for Hybrid Anomaly Detection. International Journal Research on Metaverse, 3(1), 29–45. doi:10.47738/ijrm.v3i1.44.
[32] Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., ... & Gelly, S. (2021). An image is worth 16x16 words: Transformers for image recognition at scale. Proceedings of the International Conference on Learning Representations, 3-7 May, 2021, Vienna, Austria.
[33] He, K., Zhang, X., Ren, S., & Sun, J. (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 770–778. doi:10.1109/CVPR.2016.90.
[34] Panunzio, A., & Sartori, P. (2020). Lung Cancer and Radiological Imaging. Current Radiopharmaceuticals, 13(3), 238–242. doi:10.2174/1874471013666200523161849.
[35] Guballo, J. O., & Andes, J. A. C. (2026). Network-Based Anomaly Detection in Blockchain Transactions Using Graph Neural Network (GNN) and DBSCAN. Journal of Current Research in Blockchain, 3(1), 15-27. doi:10.47738/jcrb.v3i1.55.
[36] Paszke, A., Gross, S., Massa, F., Lerer, A., Bradbury, J., Chanan, G., ... & Chintala, S. (2019). Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 8-14 December, Vancouver, Canada.
[37] Luo, L., Yu, L., Chen, H., Liu, Q., Wang, X., Xu, J., & Heng, P. A. (2020). Deep Mining External Imperfect Data for Chest X-Ray Disease Screening. IEEE Transactions on Medical Imaging, 39(11), 3583–3594. doi:10.1109/TMI.2020.3000949.
[38] Kingma, D. P., & Ba, J. (2014). Adam: A method for stochastic optimization. arXiv Preprint, arXiv:1412.6980. doi:10.48550/arXiv.1412.6980.
[39] Alkhoze, M., & Almasre, M. (2025). Sentiment analysis of Mobile Legends Play Store reviews using support vector machine and naïve Bayes. Journal of Digital Market and Digital Currency, 2(4), 368-389. doi:10.47738/jdmdc.v2i4.44.
[40] lbardi, F., Kabir, H. M. D., Bhuiyan, M. M. I., Kebria, P. M., Khosravi, A., & Nahavandi, S. (2021). A Comprehensive Study on Torchvision Pre-trained Models for Fine-grained Inter-species Classification. 2021 IEEE International Conference on Systems, Man, and Cybernetics (SMC), 2767–2774. doi:10.1109/SMC52423.2021.9659161.
[41] Angelia, C. R., Nurhayati, K., & Amalia, D. (2025). Understanding User Satisfaction in Digital Finance through Sentiment Analysis of User Reviews. Journal of Digital Market and Digital Currency, 2(4), 390-407. doi:10.47738/jdmdc.v2i4.45.
[42] Alahmari, S. S., Goldgof, D. B., Mouton, P. R., & Hall, L. O. (2020). Challenges for the Repeatability of Deep Learning Models. IEEE Access, 8, 211860–211868. doi:10.1109/ACCESS.2020.3039833.
[43] El-Fiky, A., Shouman, M. A., Hamada, S., El-Sayed, A., & Karar, M. E. (2021). Multi-label transfer learning for identifying lung diseases using chest X-rays. ICEEM 2021 - 2nd IEEE International Conference on Electronic Engineering, 1–6. doi:10.1109/ICEEM52022.2021.9480622.
[44] Baltruschat, I. M., Nickisch, H., Grass, M., Knopp, T., & Saalbach, A. (2019). Comparison of Deep Learning Approaches for Multi-Label Chest X-Ray Classification. Scientific Reports, 9(1), 6381. doi:10.1038/s41598-019-42294-8.
[45] Aljohani, R. A. M., & Alnahdi, A. A. (2025). Exploring football player salary prediction using random forest: Leveraging player demographics and team associations. International Journal for Applied Information Management, 5(4), 203-213. doi:10.47738/ijaim.v5i4.115.
[46] Yenni, H., Muzawi, R., Karpen, Anam, M. K., Kasaf, M., Hadi, T. R. M., & Wahyuni, D. S. (2026). MYCD: Integration of YOLO-CNN and DenseNet for Real-Time Road Damage Detection Based on Field Images. Journal of Applied Data Sciences, 7(1), 384–395. doi:10.47738/jads.v7i1.1040.
[47] Zunair, H., & Hamza, A. Ben. (2021). Synthesis of COVID-19 chest X-rays using unpaired image-to-image translation. Social Network Analysis and Mining, 11(1), 1–10. doi:10.1007/s13278-021-00731-5.
[48] Haque, M. I. U., Dubey, A. K., Danciu, I., Justice, A. C., Ovchinnikova, O. S., & Hinkle, J. D. (2023). Effect of image resolution on automated classification of chest X-rays. Journal of Medical Imaging, 10(04), 044503. doi:10.1117/1.jmi.10.4.044503.
[49] Limbong, T., Simanullang, G., & Silitonga, P. D. (2025). Optimizing Gait-Based Biometric Authentication in the Metaverse Using Random Forest and Support Vector Machine Algorithms. International Journal Research on Metaverse, 2(4), 248-268. doi:10.47738/ijrm.v2i4.37.
[50] Aljohani, R. A. M. (2025). Temporal Pattern Analysis and Transaction Volume Trends in the Ripple (XRP) Network Using Time Series Analysis. Journal of Current Research in Blockchain, 2(4), 274–290. doi:10.47738/jcrb.v2i4.49.
[51] Li, S., & Pigultong, M. (2025). Sentiment Analysis of Roblox App Reviews: Correlating User Feedback with Ratings Using Lexicon and Machine Learning Methods. Journal of Digital Market and Digital Currency, 2(3), 298–322.
[52] Haruna, Y., Qin, S., Chukkol, A. H. A., Yusuf, A. A., Bello, I., & Lawan, A. (2025). Exploring the synergies of hybrid convolutional neural network and Vision Transformer architectures for computer vision: A survey. Engineering Applications of Artificial Intelligence, 144, 110057. doi:10.1016/j.engappai.2025.110057.
[53] Ramadani, N., & Nanjar, A. (2025). Deciphering Weather Dynamics and Climate Shifts in Seattle for Informed Risk Management. International Journal for Applied Information Management, 5(3), 190-200. doi:10.47738/ijaim.v5i3.105.
- This work (including HTML and PDF Files) is licensed under a Creative Commons Attribution 4.0 International License.





















