Contextual Semantic Embeddings Based on Transformer Models for Arabic Biomedical Questions Classification

Arabic Question Classification Biomedical Domain Natural Language Processing Transformers BERT Fine-Tuning Question Answering Systems Sentence Embedding.

Authors

  • Ismail Ait Talghalit
    ismail.aittalghalit@uit.ac.ma
    Engineering Sciences Laboratory, National School of Applied Sciences, Ibn Tofail University, Kenitra, 14000,, Morocco
  • Hamza Alami LISAC Laboratory, Faculty of Sciences Dhar El Mahraz, Sidi Mohamed Ben Abdellah University, Fez, 30003,, Morocco
  • Said Ouatik El Alaoui Engineering Sciences Laboratory, National School of Applied Sciences, Ibn Tofail University, Kenitra, 14000,, Morocco

Downloads

Arabic biomedical question classification (ABQC) is a challenging task due to various reasons including, the specialized jargon expressed in Arabic language, complex semantics of Arabic vocabulary and the lack of specific datasets and corpora. When representing questions, only a few studies deal with ABQC by taking into account the word context. In this work, we propose a classification model designed for Arabic biomedical questions. We build vector representations capturing the contextual and semantic information of Arabic biomedical text, which presents numerous challenges, such as the derivational morphology of Arabic language, the specialized terminology of biomedical terms and the lack of capitalization in text. Our representation adapts the extensive knowledge encoded in BERT (Bidirectional Encoder Representations from Transformers) and other transformer models, to address the aforementioned challenges. Several experiments have been conducted on a dedicated Arabic biomedical dataset namely: MAQA, with well-known transformer models including BERT, AraBERT, BioBERT, RoBERTa, and DistilBERT fine-tuned for the classification task. Obtained results show that our method achieves remarkable performance with an accuracy of 93.31% and an F1-score of 93.35%.

 

Doi: 10.28991/HIJ-2024-05-04-011

Full Text: PDF