Efficient Adaptation of Multilingual Models for Japanese ASR
Journal:
arXiv
Published Date:
Dec 14, 2024
Abstract
This study explores fine-tuning multilingual ASR (Automatic Speech
Recognition) models, specifically OpenAI's Whisper-Tiny, to improve performance
in Japanese. While multilingual models like Whisper offer versatility, they
often lack precision in specific languages. Conversely, monolingual models like
ReazonSpeech excel in language-specific tasks but are less adaptable. Using
Japanese-specific datasets and Low-Rank Adaptation (LoRA) along with end-to-end
(E2E) training, we fine-tuned Whisper-Tiny to bridge this gap. Our results show
that fine-tuning reduced Whisper-Tiny's Character Error Rate (CER) from 32.7 to
20.8 with LoRA and to 14.7 with end-to-end fine-tuning, surpassing
Whisper-Base's CER of 20.2. However, challenges with domain-specific terms
remain, highlighting the need for specialized datasets. These findings
demonstrate that fine-tuning multilingual models can achieve strong
language-specific performance while retaining their flexibility. This approach
provides a scalable solution for improving ASR in resource-constrained
environments and languages with complex writing systems like Japanese.