MultiMed-ST: Large-scale Many-to-many Multilingual Medical Speech Translation
Journal:
arXiv
Published Date:
Apr 4, 2025
Abstract
Multilingual speech translation (ST) in the medical domain enhances patient
care by enabling efficient communication across language barriers, alleviating
specialized workforce shortages, and facilitating improved diagnosis and
treatment, particularly during pandemics. In this work, we present the first
systematic study on medical ST, to our best knowledge, by releasing
MultiMed-ST, a large-scale ST dataset for the medical domain, spanning all
translation directions in five languages: Vietnamese, English, German, French,
Traditional Chinese and Simplified Chinese, together with the models. With
290,000 samples, our dataset is the largest medical machine translation (MT)
dataset and the largest many-to-many multilingual ST among all domains.
Secondly, we present the most extensive analysis study in ST research to date,
including: empirical baselines, bilingual-multilingual comparative study,
end-to-end vs. cascaded comparative study, task-specific vs. multi-task
sequence-to-sequence (seq2seq) comparative study, code-switch analysis, and
quantitative-qualitative error analysis. All code, data, and models are
available online: https://github.com/leduckhai/MultiMed-ST.