Convolutional neural network-based automatic classification of midsagittal tongue gestural targets using B-mode ultrasound images.
Journal:
The Journal of the Acoustical Society of America
Published Date:
Jun 1, 2017
Abstract
Tongue gestural target classification is of great interest to researchers in the speech production field. Recently, deep convolutional neural networks (CNN) have shown superiority to standard feature extraction techniques in a variety of domains. In this letter, both CNN-based speaker-dependent and speaker-independent tongue gestural target classification experiments are conducted to classify tongue gestures during natural speech production. The CNN-based method achieves state-of-the-art performance, even though no pre-training of the CNN (with the exception of a data augmentation preprocessing) was carried out.