Research on Tone Enhancement of Mandarin Pitch Controllable Electrolaryngeal Speech Based on Deep Learning.

Journal: Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
PMID:

Abstract

The deep learning-based electrolaryngeal (EL) voice conversion methods have achieved good results in non-tonal languages. However, the effectiveness in tonal languages, such as Mandarin Chinese (Mandarin), remains suboptimal. The reason may be that the EL speech lacks any tone information. In this paper, we further improved the quality of EL speech by initially generating a basic tonal speech using a pitch-controlled electrolarynx, followed by employing a generative model to refine and enhance the Mandarin tones. Utilizing the cycle-consistent adversarial networks (CycleGAN) model in conjunction with continuous wavelet transformation techniques, the speech fundamental frequency (F0) is decomposed into a hierarchy of multiple frequency scales, enhancing the granularity of feature extraction. Then fed into the CycleGAN for training. Finally, the conversion results of four tones in Mandarin are compared and evaluated. The results show that the converted tones are significantly closer to the normal tones in terms of tone contour, tone range, and tone value. The converted tones not only exhibit smoother, but also display some details such as "offset-section and onset-section" that are similar to the normal tones. After conversion, the F0 root mean square error for the four EL tones markedly decreased from 1.106 to 0.277, while the F0 correlation coefficient significantly improved from 0.393 to 0.781. Two evaluation indexes show that the accuracy of the converted tones have been greatly improved. The results demonstrate that the proposed method can be applied to tone enhancement in Mandarin pitch-controlled EL speech. This study opens a new way for EL speech enhancement in tonal languages, offering potential for future research and application.

Authors

  • Jie Zhou
    Departments of Ultrasound, Jiading District Central Hospital Affiliated Shanghai University of Medicine &Health Sciences, Shanghai, China.
  • Li Wang
    College of Marine Electrical Engineering, Dalian Maritime University, Dalian, China.
  • Fengji Li
  • Shaochuan Zhang
  • Tao Liu
    Institute of Urology and Nephrology, The First Affiliated Hospital of Guangxi Medical University, Nanning, China.
  • Xiaohong Chen
    Department of Neurology, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China.
  • Haijun Niu
    School of Biological Science and Medical Engineering, Beihang University, Beijing, China.