Dialogues of delivery: a multilingual question-answer dataset for maternal healthcare in East African languages.

Journal: BMC research notes
Published Date:

Abstract

OBJECTIVE: There is a critical scarcity of domain-specific, clinically grounded Natural Language Processing (NLP) resources for African languages. In Western Uganda, linguistic diversity creates a barrier to maternal healthcare, as mothers lack access to health information in their native languages. The objective of this dataset is to provide a high-quality, in-language medical corpus to enable the development and fine-tuning of Large Language Models (LLMs) and conversational AI tools for maternal health in resource-constrained settings. DATA DESCRIPTION: The "Dialogues of Delivery" dataset is a multilingual, parallel corpus comprising 3,694 question-and-answer pairs presented in four languages: English, Luganda, Runyankore, and Swahili (14,800 total entries). Using facility-based convenience sampling at two health facilities in Western Uganda, primary data was collected via structured, open-ended questionnaires from 150 participants (expectant/postpartum mothers and maternal healthcare providers). The dataset underwent a rigorous forward-backward translation protocol by certified linguists and human-in-the-loop clinical validation by independent medical professionals. The dataset captures core maternal health domains, providing a culturally and clinically validated foundation for Afrocentric AI development.

Authors

Keywords

No keywords available for this article.