Auditory feature representation using convolutional restricted Boltzmann machine and Teager energy operator for speech recognition.

Journal: The Journal of the Acoustical Society of America

Published Date: Jun 1, 2017

Abstract

In this letter, authors propose an auditory feature representation technique with the filterbank learned using an annealing dropout convolutional restricted Boltzmann machine (ConvRBM) and noise-robust energy estimation using the Teager energy operator (TEO). TEO is applied on each subband of ConvRBM filterbank and pooled later to get the short-term spectral features. Experiments on AURORA 4 database show that the proposed features perform better than the Mel filterbank features. The relative improvement of 2.59%-11.63% and 1.26%-6.87% in word error rate is achieved using the time delay neural network and the bidirectional long short-term memory models, respectively.

Authors

Hardik B Sailor

Speech Research Lab, Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar-382007, Gujarat, India sailor_hardik@daiict.ac.in, hemant_patil@daiict.ac.in.
Hemant A Patil

Speech Research Lab, Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar-382007, Gujarat, India sailor_hardik@daiict.ac.in, hemant_patil@daiict.ac.in.

Keywords

Acoustics Humans Machine Learning Neural Networks, Computer Pattern Recognition, Automated Signal Processing, Computer-Assisted Sound Spectrography Speech Acoustics Speech Production Measurement Time Factors Voice Quality

External Resources

View on PubMed Access via DOI PubMed (28618812)

Auditory feature representation using convolutional restricted Boltzmann machine and Teager energy operator for speech recognition.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals