Mandarin Speech Reconstruction from Tongue Motion Ultrasound Images based on Generative Adversarial Networks.

Journal: Annual International Conference of the IEEE Engineering in Medicine and Biology Society. IEEE Engineering in Medicine and Biology Society. Annual International Conference
PMID:

Abstract

Speech impairment resulting from laryngectomy causes severe physiological and psychological distress to laryngectomee. In clinical practice, the upper vocal tract articulatory organs function normally in most laryngectomee. The potential to reconstruct speech by leveraging articulatory information is of significant importance, offering a meaningful contribution to the effective rehabilitation of speech in these patients. To begin, we created a Mandarin corpus, capturing simultaneous dynamic tongue motion ultrasound images and speech waveform during experiment. Then we utilized an autoencoder to extract deep representation from ultrasound images. Building on this, a speech waveform generation model was established using generative adversarial networks, and both objective and subjective evaluations were conducted to access the quality of the reconstructed speech. The results reveal that the phoneme accuracy of the reconstructed speech reaches 72.43%, with accuracy of Mandarin tones being 76.10%. Observing the mel-spectrogram and fundamental frequency contour, the reconstructed speech shows a high degree of similarity to original speech. Additionally, subjective speech perceptions of the reconstructed speech affirm its acceptability (mean opinion score > 6). The method presented in this paper enables to reconstruct tonal Mandarin speech from dynamic tongue motion ultrasound images. However, future research should focus on specific conditions of laryngectomee, improving and optimizing model performance, expanding training datasets, and enhancing the quality of reconstructed speech.

Authors

  • Fengji Li
  • Fei Shen
    Physical and Chemical Laboratory, Jiangsu Provincial Center for Disease Control & Prevention, 172 Jiangsu Rd, Nanjing, 210009, China.
  • Ding Ma
    Department of Obstetrics and Gynaecology, Tongji Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei 430030, China.
  • Shaochuan Zhang
  • Jie Zhou
    Departments of Ultrasound, Jiading District Central Hospital Affiliated Shanghai University of Medicine &Health Sciences, Shanghai, China.
  • Li Wang
    College of Marine Electrical Engineering, Dalian Maritime University, Dalian, China.
  • Fan Fan
    Key Laboratory of the Ministry of Education for Optoelectronic Measurement Technology and Instrument, Beijing Information Science and Technology University, Beijing, China.
  • Tao Liu
    Institute of Urology and Nephrology, The First Affiliated Hospital of Guangxi Medical University, Nanning, China.
  • Xiaohong Chen
    Department of Neurology, The Third Affiliated Hospital of Sun Yat-Sen University, Guangzhou, China.
  • Tomoki Toda
  • Haijun Niu
    School of Biological Science and Medical Engineering, Beihang University, Beijing, China.