Unraveling the complexities of pathological voice through saliency analysis.

Journal: Computers in biology and medicine
PMID:

Abstract

The human voice is an essential communication tool, but various disorders and habits can disrupt it. Diagnosis of pathological and abnormal voices is very important. Conventional diagnosis of these voice pathologies can be invasive and costly. Voice pathology disorders can be effectively detected using Artificial Intelligence and computer-aided voice pathology classification tools. Previous studies focused primarily on binary classification, leaving limited attention to multi-class classification. This study proposes three different neural network architectures to investigate the feature characteristics of three voice pathologies-Hyperkinetic Dysphonia, Hypokinetic Dysphonia, Reflux Laryngitis, and healthy voices using multi-class classification and the Voice ICar fEDerico II (VOICED) dataset. The study proposes UNet++ autoencoder-based denoiser techniques for accurate feature extraction to overcome noisy data. The architectures include a Multi-Layer Perceptron (MLP) trained on structured feature sets, a Short-Time Fourier Transform (STFT) model, and a Mel-Frequency Cepstral Coefficients (MFCC) model. The MLP model on 143 features achieved 97.1% accuracy, while the STFT model showed similar performance with increased sensitivity of 99.8%. The MFCC model maintained 97.1% accuracy but with a smaller model size and improved accuracy on the Reflux Laryngitis class. The study identifies crucial features through saliency analysis and reveals that detecting voice abnormalities requires the identification of regions of inaudible high-pitch sounds. Additionally, the study highlights the challenges posed by limited and disjointed pathological voice databases and proposes solutions for enhancing the performance of voice abnormality classification. Overall, the study's findings have potential applications in clinical applications and specialized audio-capturing tools.

Authors

  • Abdullah Abdul Sattar Shaikh
    Department of Computer Science and Engineering, Bangalore Institute of Technology, Bangalore, 560004, Karnataka, India. Electronic address: abdullahshaikh136@gmail.com.
  • M S Bhargavi
    Department of Computer Science and Engineering, Bangalore Institute of Technology, Bangalore, 560004, Karnataka, India. Electronic address: ms.bhargavi@gmail.com.
  • Ganesh R Naik
    MARCS Institute for Brain, Behaviour and Development, Western Sydney University, Penrith, NSW, Australia.