Efficient Adaptation of Multilingual Models for Japanese ASR

Journal: arXiv

Published Date: Dec 14, 2024

Abstract

This study explores fine-tuning multilingual ASR (Automatic Speech Recognition) models, specifically OpenAI's Whisper-Tiny, to improve performance in Japanese. While multilingual models like Whisper offer versatility, they often lack precision in specific languages. Conversely, monolingual models like ReazonSpeech excel in language-specific tasks but are less adaptable. Using Japanese-specific datasets and Low-Rank Adaptation (LoRA) along with end-to-end (E2E) training, we fine-tuned Whisper-Tiny to bridge this gap. Our results show that fine-tuning reduced Whisper-Tiny's Character Error Rate (CER) from 32.7 to 20.8 with LoRA and to 14.7 with end-to-end fine-tuning, surpassing Whisper-Base's CER of 20.2. However, challenges with domain-specific terms remain, highlighting the need for specialized datasets. These findings demonstrate that fine-tuning multilingual models can achieve strong language-specific performance while retaining their flexibility. This approach provides a scalable solution for improving ASR in resource-constrained environments and languages with complex writing systems like Japanese.

Authors

Mark Bajo
Haruka Fukukawa
Ryuji Morita
Yuma Ogasawara

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2412.10705v1)

Efficient Adaptation of Multilingual Models for Japanese ASR

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Efficient Adaptation of Multilingual Models for Japanese ASR

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals