Cross-Language Approach for Quranic QA
Journal:
arXiv
Published Date:
Jan 29, 2025
Abstract
Question answering systems face critical limitations in languages with
limited resources and scarce data, making the development of robust models
especially challenging. The Quranic QA system holds significant importance as
it facilitates a deeper understanding of the Quran, a Holy text for over a
billion people worldwide. However, these systems face unique challenges,
including the linguistic disparity between questions written in Modern Standard
Arabic and answers found in Quranic verses written in Classical Arabic, and the
small size of existing datasets, which further restricts model performance. To
address these challenges, we adopt a cross-language approach by (1) Dataset
Augmentation: expanding and enriching the dataset through machine translation
to convert Arabic questions into English, paraphrasing questions to create
linguistic diversity, and retrieving answers from an English translation of the
Quran to align with multilingual training requirements; and (2) Language Model
Fine-Tuning: utilizing pre-trained models such as BERT-Medium, RoBERTa-Base,
DeBERTa-v3-Base, ELECTRA-Large, Flan-T5, Bloom, and Falcon to address the
specific requirements of Quranic QA. Experimental results demonstrate that this
cross-language approach significantly improves model performance, with
RoBERTa-Base achieving the highest MAP@10 (0.34) and MRR (0.52), while
DeBERTa-v3-Base excels in Recall@10 (0.50) and Precision@10 (0.24). These
findings underscore the effectiveness of cross-language strategies in
overcoming linguistic barriers and advancing Quranic QA systems