Research on Tone Enhancement of Mandarin Pitch Controllable Electrolaryngeal Speech Based on Deep Learning.
Journal:
Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
PMID:
40039465
Abstract
The deep learning-based electrolaryngeal (EL) voice conversion methods have achieved good results in non-tonal languages. However, the effectiveness in tonal languages, such as Mandarin Chinese (Mandarin), remains suboptimal. The reason may be that the EL speech lacks any tone information. In this paper, we further improved the quality of EL speech by initially generating a basic tonal speech using a pitch-controlled electrolarynx, followed by employing a generative model to refine and enhance the Mandarin tones. Utilizing the cycle-consistent adversarial networks (CycleGAN) model in conjunction with continuous wavelet transformation techniques, the speech fundamental frequency (F0) is decomposed into a hierarchy of multiple frequency scales, enhancing the granularity of feature extraction. Then fed into the CycleGAN for training. Finally, the conversion results of four tones in Mandarin are compared and evaluated. The results show that the converted tones are significantly closer to the normal tones in terms of tone contour, tone range, and tone value. The converted tones not only exhibit smoother, but also display some details such as "offset-section and onset-section" that are similar to the normal tones. After conversion, the F0 root mean square error for the four EL tones markedly decreased from 1.106 to 0.277, while the F0 correlation coefficient significantly improved from 0.393 to 0.781. Two evaluation indexes show that the accuracy of the converted tones have been greatly improved. The results demonstrate that the proposed method can be applied to tone enhancement in Mandarin pitch-controlled EL speech. This study opens a new way for EL speech enhancement in tonal languages, offering potential for future research and application.