Convolutional neural network-based automatic classification of midsagittal tongue gestural targets using B-mode ultrasound images.

Journal: The Journal of the Acoustical Society of America
Published Date:

Abstract

Tongue gestural target classification is of great interest to researchers in the speech production field. Recently, deep convolutional neural networks (CNN) have shown superiority to standard feature extraction techniques in a variety of domains. In this letter, both CNN-based speaker-dependent and speaker-independent tongue gestural target classification experiments are conducted to classify tongue gestures during natural speech production. The CNN-based method achieves state-of-the-art performance, even though no pre-training of the CNN (with the exception of a data augmentation preprocessing) was carried out.

Authors

  • Kele Xu
    Department of Engineering, Université Pierre et Marie Curie, Paris 75005, France kelele.xu@gmail.com.
  • Pierre Roussel
    Langevin Institute, ESPCI-ParisTech, Paris 75005, France pierre.roussel@espci.fr.
  • Tamás Gábor Csapó
    Department of Telecommunications and Media Informatics, Budapest University of Technology and Economics, Budapest, Hungary csapot@tmit.bme.hu.
  • Bruce Denby
    Tianjin University, Tianjin, 300000 China bruce.denby@upmc.fr.