Research on optimal deep learning modeling in HaiNan dialect recognition.

Journal: Scientific reports
Published Date:

Abstract

The speech recognition task of the HaiNan dialect faces significant differences in phonology, intonation, and grammatical structure among dialects, which in turn show significant regionalization characteristics, which makes the task of dialect-to-Mandarin conversion more complex. Currently, the research on the HaiNan dialect speech recognition is still in its early stages and lacks sufficient corpus resources, especially in the task of multi-dialect recognition. Traditional models are difficult to solve with the problem of data scarcity and diverse dialect characteristics effectively. To overcome these challenges, this study explores the application of multiple deep learning models in the task of converting HaiNan dialect to Mandarin, aiming to identify the optimal deep learning model in HaiNan dialect recognition, and through extensive experimental comparative analysis, proposes a fusion model combining Convolutional Neural Networks and Multi-Head Self-Attention Mechanism (ConvMHANet). It demonstrates excellent performance in different dialectal scenes, especially in solving complex dialectal phonetic and contextual dependencies, with an accuracy of 97.58% and a character error rate down to 0.0163 in the multi-classification mixing task, showing strong generalization capabilities.

Authors

  • ZiXuan Qi
    College of Information Science and Technology, HaiNan Normal University, Haikou, 571158, China.
  • FuYun Li
    College of Information Science and Technology, HaiNan Normal University, Haikou, 571158, China. lifuyun@hainnu.edu.cn.
  • Haixia Long
    Department of Information Science and Technology, Hainan Normal University, Haikou 571158, China. myresearch_hainnu@163.com.