Improved speech inversion using general regression neural network.

Journal: The Journal of the Acoustical Society of America
Published Date:

Abstract

The problem of nonlinear acoustic to articulatory inversion mapping is investigated in the feature space using two models, the deep belief network (DBN) which is the state-of-the-art, and the general regression neural network (GRNN). The task is to estimate a set of articulatory features for improved speech recognition. Experiments with MOCHA-TIMIT and MNGU0 databases reveal that, for speech inversion, GRNN yields a lower root-mean-square error and a higher correlation than DBN. It is also shown that conjunction of acoustic and GRNN-estimated articulatory features yields state-of-the-art accuracy in broad class phonetic classification and phoneme recognition using less computational power.

Authors

  • Shamima Najnin
    Institute for Intelligent Systems, and Department of Electrical and Computer Engineering, 3815 Central Avenue, The University of Memphis, Memphis, Tennessee 38152, USA snajnin@memphis.edu, bbnerjee@memphis.edu.
  • Bonny Banerjee
    Institute for Intelligent Systems, and Department of Electrical and Computer Engineering, 3815 Central Avenue, The University of Memphis, Memphis, Tennessee 38152, USA snajnin@memphis.edu, bbnerjee@memphis.edu.