Contextual Semantic Embeddings Based on Transformer Models for Arabic Biomedical Questions Classification

Ismail Ait Talghalit; Hamza Alami; Said Ouatik El Alaoui

doi:10.28991/HIJ-2024-05-04-011

Authors

Ismail Ait Talghalit
ismail.aittalghalit@uit.ac.ma
Engineering Sciences Laboratory, National School of Applied Sciences, Ibn Tofail University, Kenitra, 14000,, Morocco
Hamza Alami LISAC Laboratory, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez, 30003,, Morocco
Said Ouatik El Alaoui Engineering Sciences Laboratory, National School of Applied Sciences, Ibn Tofail University, Kenitra, 14000,, Morocco

Vol. 5 No. 4 (2024): December

Research Articles

Downloads

PDF

Abstract
How to Cite
Metrics
References
License

Arabic biomedical question classification (ABQC) is a challenging task due to various reasons including, the specialized jargon expressed in Arabic language, complex semantics of Arabic vocabulary and the lack of specific datasets and corpora. When representing questions, only a few studies deal with ABQC by taking into account the word context. In this work, we propose a classification model designed for Arabic biomedical questions. We build vector representations capturing the contextual and semantic information of Arabic biomedical text, which presents numerous challenges, such as the derivational morphology of Arabic language, the specialized terminology of biomedical terms and the lack of capitalization in text. Our representation adapts the extensive knowledge encoded in BERT (Bidirectional Encoder Representations from Transformers) and other transformer models, to address the aforementioned challenges. Several experiments have been conducted on a dedicated Arabic biomedical dataset namely: MAQA, with well-known transformer models including BERT, AraBERT, BioBERT, RoBERTa, and DistilBERT fine-tuned for the classification task. Obtained results show that our method achieves remarkable performance with an accuracy of 93.31% and an F1-score of 93.35%.

Doi: 10.28991/HIJ-2024-05-04-011

Full Text: PDF

Ait Talghalit, I., Alami, H., & El Alaoui, S. O. (2024). Contextual Semantic Embeddings Based on Transformer Models for Arabic Biomedical Questions Classification. HighTech and Innovation Journal, 5(4), 1024–1037. https://doi.org/10.28991/HIJ-2024-05-04-011

Download Citation

Sarrouti, M., & El Alaoui, S. O. (2017). A machine learning-based method for question type classification in biomedical question answering. Methods of Information in Medicine, 56(3), 209–216. doi:10.3414/ME16-01-0116.

Xu, S., Cheng, G., & Kong, F. (2016). Research on question classification for automatic question answering. 2016 international conference on Asian language processing (IALP), 218–221. doi:10.1109/IALP.2016.7875972.

Babu, A., & Boddu, S. B. (2024). BERT-Based Medical Chatbot: Enhancing Healthcare Communication through Natural Language Understanding. Exploratory Research in Clinical and Social Pharmacy, 13, 100419. doi:10.1016/j.rcsop.2024.100419.

Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019). BERT: Pre-training of deep bidirectional transformers for language understanding. NAACL HLT 2019 - 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies - Proceedings of the Conference, 1, 4171–4186.

Tama, B. A., & Lim, S. (2020). A comparative performance evaluation of classification algorithms for clinical decision support systems. Mathematics, 8(10), 1–24. doi:10.3390/math8101814.

Hassan, E., Abd El-Hafeez, T., & Shams, M. Y. (2024). Optimizing classification of diseases through language model analysis of symptoms. Scientific Reports, 14(1), 01 2024. doi:10.1038/s41598-024-51615-5.

Momtazi, S. (2018). Unsupervised Latent Dirichlet Allocation for supervised question classification. Information Processing and Management, 54(3), 380–393. doi:10.1016/j.ipm.2018.01.001.

Hamza, A., En-Nahnahi, N., Zidani, K. A., & El Alaoui Ouatik, S. (2021). An Arabic question classification method based on new taxonomy and continuous distributed representation of words. Journal of King Saud University - Computer and Information Sciences, 33(2), 218–224. doi:10.1016/j.jksuci.2019.01.001.

Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. 1st International Conference on Learning Representations, ICLR 2013 - Workshop Track Proceedings, 1–12.

Rush, A. M., Chopra, S., & Weston, J. (2015). A neural attention model for sentence summarization. Conference Proceedings - EMNLP 2015: Conference on Empirical Methods in Natural Language Processing, 379–389.

Aggarwal, C. C., & Zhai, C. X. (2012). A survey of text clustering algorithms. Mining Text Data, 9781461432234, 77–128. doi:10.1007/978-1-4614-3223-4_4.

Mikolov, T., Sutskever, I., Chen, K., Corrado, G., & Dean, J. (2013). Distributed Representations of Words and Phrases and their Compositionality. ArXiv Preprint, ArXiv:1310.4546. doi:10.48550/arXiv.1310.4546.

Pennington, J., Socher, R., & Manning, C. D. (2014). GloVe: Global vectors for word representation. 2014 Conference on Empirical Methods in Natural Language Processing, Proceedings of the Conference, 1532–1543. doi:10.3115/v1/d14-1162.

Bojanowski, P., Grave, E., Joulin, A., & Mikolov, T. (2017). Enriching Word Vectors with Subword Information. Transactions of the Association for Computational Linguistics, 5, 135–146. doi:10.1162/tacl_a_00051.

Zhang, Y., Chen, Q., Yang, Z., Lin, H., & Lu, Z. (2019). BioWordVec, improving biomedical word embeddings with subword information and MeSH. Scientific Data, 6(1), 52. doi:10.1038/s41597-019-0055-0.

Lahbari, I., & El Alaoui, S. O. (2024). Exploring Sentence Embedding Representation for Arabic Question Answering. International Journal of Computing and Digital Systems, 15(1), 1229–1241. doi:10.12785/ijcds/150187.

Antoun, W., Baly, F., & Hajj, H. (2020). Arabert: Transformer-based model for Arabic language understanding. arXiv preprint, arXiv:2003.00104. doi:10.48550/arXiv.2003.00104.

Abdelhay, M., & Mohammed, A. (2022). MAQA: Medical Arabic Q & A dataset. Harvard Dataverse, Cambridge, United States.

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (1907). RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692. doi:10.48550/arXiv.1907.11692.

Lee, J., Yoon, W., Kim, S., Kim, D., Kim, S., So, C. H., & Kang, J. (2020). BioBERT: A pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4), 1234–1240. doi:10.1093/bioinformatics/btz682.

Sanh, V., Debut, L., Chaumond, J., & Wolf, T. (2019). DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv preprint, arXiv:1910.01108. doi:10.48550/arXiv.1910.01108.

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, Š., & Polosukhin, I. (2017). Attention is all you need. Advances in Neural Information Processing Systems, 2017-December, 5999–6009.

Mutabazi, E., Ni, J., Tang, G., & Cao, W. (2023). An Improved Model for Medical Forum Question Classification Based on CNN and BiLSTM. Applied Sciences (Switzerland), 13(15), 8623. doi:10.3390/app13158623.

Vihikan, W. O., & Trisna, I. N. P. Indonesian health question multi-class classification based on deep learning. Journal of Information Systems and Informatics, 6(3), 1931–1944.

Mansour, M., Tohamy, M., Ezzat, Z., & Torki, M. (2020). {A}rabic Dialect Identification Using {BERT} Fine-Tuning. Proceedings of the Fifth Arabic Natural Language Processing Workshop, 308–312.

Boudjellal, N., Zhang, H., Khan, A., Ahmad, A., Naseem, R., Shang, J., & Dai, L. (2021). ABioNER: A BERT-Based Model for Arabic Biomedical Named-Entity Recognition. Complexity, 2021, 1–6. doi:10.1155/2021/6633213.

Zafar, A., Sahoo, S. K., Varshney, D., Das, A., & Ekbal, A. (2024). KIMedQA: towards building knowledge-enhanced medical QA models. Journal of Intelligent Information Systems, 62(3), 833–858. doi:10.1007/s10844-024-00844-1.

Hammoud, J., Vatian, A., Dobrenko, N., Vedernikov, N., Shalyto, A., & Gusarova, N. (2021). New Arabic Medical Dataset for Diseases Classification. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 13113 LNCS, 196–203. doi:10.1007/978-3-030-91608-4_20.

Al-Smadi, B. S. (2024). DeBERTa-BiLSTM: A multi-label classification model of Arabic medical questions using pre-trained models and deep learning. Computers in Biology and Medicine, 170, 107921. doi:10.1016/j.compbiomed.2024.107921.

Yu, H., Liu, C., Zhang, L., Wu, C., Liang, G., Escorcia-Gutierrez, J., & Ghoneim, O. A. (2023). An intent classification method for questions in "Treatise on Febrile diseases” based on TinyBERT-CNN fusion model. Computers in Biology and Medicine, 162, 107075. doi:10.1016/j.compbiomed.2023.107075.

Kofi Akpatsa, S., Lei, H., Li, X., Kofi Setornyo Obeng, V.-H., Mensah Martey, E., Clement Addo, P., & Dodzi Fiawoo, D. (2022). Online News Sentiment Classification Using DistilBERT. Journal of Quantum Computing, 4(1), 1–11. doi:10.32604/jqc.2022.026658.

Aftan, S., & Shah, H. (2023). Using the AraBERT Model for Customer Satisfaction Classification of Telecom Sectors in Saudi Arabia. Brain Sciences, 13(1), 147. doi:10.3390/brainsci13010147.

El-Alami, F. zahra, Ouatik El Alaoui, S., & En Nahnahi, N. (2022). Contextual semantic embeddings based on fine-tuned AraBERT model for Arabic text multi-class categorization. Journal of King Saud University - Computer and Information Sciences, 34(10), 8422–8428. doi:10.1016/j.jksuci.2021.02.005.

Houssein, E. H., Mohamed, R. E., Hu, G., & Ali, A. A. (2024). Adapting transformer-based language models for heart disease detection and risk factors extraction. Journal of Big Data, 11(1). doi:10.1186/s40537-024-00903-y.

Abdelali, A., Darwish, K., Durrani, N., & Mubarak, H. (2016). Farasa: A fast and furious segmenter for Arabic. 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Proceedings of the Demonstrations Session, 11–16. doi:10.18653/v1/n16-3003.

Sun, C., Qiu, X., Xu, Y., & Huang, X. (2019). How to Fine-Tune BERT for Text Classification? Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics): Vol. 11856 LNAI, 194–206. doi:10.1007/978-3-030-32381-3_16.

Abdelhay, M., Mohammed, A., & Hefny, H. A. (2023). Deep learning for Arabic healthcare: MedicalBot. Social Network Analysis and Mining, 13(1), 71. doi:10.1007/s13278-023-01077-w.

Acceptance Rate:	27%
Review Speed:	61 days
Issue Per Year:	4
Number of Volumes:	5
Number of Issues:	19
Number of Articles:	193
Number of Reviewers:	372
Number of Contributors:	530
Contributing Countries:	63
No. of Scopus Citations:	1289
No. of WoS Citations:	1187
No. of Google Citations:	1470
Google h-index:	21
Google i10-index:	45
Abstract Views:	123,086
PDF Download:	103,923

Contextual Semantic Embeddings Based on Transformer Models for Arabic Biomedical Questions Classification

Authors

Downloads

Login

submission

Publisher & Affiliated Societies

Indexing & Abstracting

SidebarMenu

Journal Imprint

Most Cited Articles

Towards Bayesian Quantification of Permeability in Micro-scale Porous Structures – The Database of Micro Networks

Physicochemical and Microstructural Characterization of Klias Peat, Lumadan POFA, and GGBFS for Geopolymer Based Soil Stabilization

Seismic Upgradation of RC Beams Strengthened with Externally Bonded Spent Catalyst Based Ferrocement Laminates

Temporal Trends of Rainfall and Temperature over Two Sub-Divisions of Western Ghats

IndexedBy

Indexed In

twitter

Social Media

Analytics

Analytics

Information

Address

Contact Info:

Contextual Semantic Embeddings Based on Transformer Models for Arabic Biomedical Questions Classification

Authors

Downloads

Login

submission

Publisher & Affiliated Societies

Indexing & Abstracting

SidebarMenu

Journal Imprint

Journal Imprint

Journal Metrics

Most Cited Articles

Towards Bayesian Quantification of Permeability in Micro-scale Porous Structures – The Database of Micro Networks

Physicochemical and Microstructural Characterization of Klias Peat, Lumadan POFA, and GGBFS for Geopolymer Based Soil Stabilization

Seismic Upgradation of RC Beams Strengthened with Externally Bonded Spent Catalyst Based Ferrocement Laminates

Temporal Trends of Rainfall and Temperature over Two Sub-Divisions of Western Ghats

IndexedBy

Indexed In

twitter

Social Media

Analytics

Analytics

Information