Auditory feature representation using convolutional restricted Boltzmann machine and Teager energy operator for speech recognition.

Journal: The Journal of the Acoustical Society of America
Published Date:

Abstract

In this letter, authors propose an auditory feature representation technique with the filterbank learned using an annealing dropout convolutional restricted Boltzmann machine (ConvRBM) and noise-robust energy estimation using the Teager energy operator (TEO). TEO is applied on each subband of ConvRBM filterbank and pooled later to get the short-term spectral features. Experiments on AURORA 4 database show that the proposed features perform better than the Mel filterbank features. The relative improvement of 2.59%-11.63% and 1.26%-6.87% in word error rate is achieved using the time delay neural network and the bidirectional long short-term memory models, respectively.

Authors

  • Hardik B Sailor
    Speech Research Lab, Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar-382007, Gujarat, India sailor_hardik@daiict.ac.in, hemant_patil@daiict.ac.in.
  • Hemant A Patil
    Speech Research Lab, Dhirubhai Ambani Institute of Information and Communication Technology (DA-IICT), Gandhinagar-382007, Gujarat, India sailor_hardik@daiict.ac.in, hemant_patil@daiict.ac.in.