A transfer learning approach to goodness of pronunciation based automatic mispronunciation detection.

Journal: The Journal of the Acoustical Society of America

Published Date: Nov 1, 2017

Abstract

Goodness of pronunciation (GOP) is the most widely used method for automatic mispronunciation detection. In this paper, a transfer learning approach to GOP based mispronunciation detection when applying maximum F1-score criterion (MFC) training to deep neural network (DNN)-hidden Markov model based acoustic models is proposed. Rather than train the whole network using MFC, a DNN is used, whose hidden layers are borrowed from native speech recognition with only the softmax layer trained according to the MFC objective function. As a result, significant mispronunciation detection improvement is obtained. In light of this, the two-stage transfer learning based GOP is investigated in depth. The first stage exploits the hidden layer(s) to extract phonetic-discriminating features. The second stage uses a trainable softmax layer to learn the human standard for judgment. The validation is carried out by experimenting with different mispronunciation detection architectures using acoustic models trained by different criteria. It is found that it is preferable to use frame-level cross-entropy to train the hidden layer parameters. Classifier based mispronunciation detection is further experimented with using features computed by transfer learning based GOP and it is shown that it also helps to achieve better results.

Authors

Hao Huang

School of Information Science and Engineering, Xinjiang University, Shangli Road, Urumqi 830046, China.
Haihua Xu

HBISolutions Inc., Palo Alto, CA 94301, USA.
Ying Hu

Department of Ultrasonography, The First Affiliated Hospital, College of Medicine, Zhejiang University, Qingchun Road No. 79, Hangzhou, Zhejiang 310003, China.
Gang Zhou

School of Information Science and Engineering, Xinjiang University, Shangli Road, Urumqi 830046, China.

Keywords

Acoustics Deep Learning Humans Judgment Markov Chains Pattern Recognition, Automated Phonetics Signal Processing, Computer-Assisted Software Speech Acoustics Speech Perception Speech Production Measurement Voice Quality

External Resources

View on PubMed Access via DOI PubMed (29195422)

A transfer learning approach to goodness of pronunciation based automatic mispronunciation detection.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals