A hybrid approach for binary and multi-class classification of voice disorders using a pre-trained model and ensemble classifiers.

Journal: BMC medical informatics and decision making

PMID: 40312383

Abstract

Recent advances in artificial intelligence-based audio and speech processing have increasingly focused on the binary and multi-class classification of voice disorders. Despite progress, achieving high accuracy in multi-class classification remains challenging. This paper proposes a novel hybrid approach using a two-stage framework to enhance voice disorders classification performance, and achieve state-of-the-art accuracies in multi-class classification. Our hybrid approach, combines deep learning features with various powerful classifiers. In the first stage, high-level feature embeddings are extracted from voice data spectrograms using a pre-trained VGGish model. In the second stage, these embeddings are used as input to four different classifiers: Support Vector Machine (SVM), Logistic Regression (LR), Multi-Layer Perceptron (MLP), and an Ensemble Classifier (EC). Experiments are conducted on a subset of the Saarbruecken Voice Database (SVD) for male, female, and combined speakers. For binary classification, VGGish-SVM achieved the highest accuracy for male speakers (82.45% for healthy vs. disordered; 75.45% for hyperfunctional dysphonia vs. vocal fold paresis), while VGGish-EC performed best for female speakers (71.54% for healthy vs. disordered; 68.42% for hyperfunctional dysphonia vs. vocal fold paresis). In multi-class classification, VGGish-SVM outperformed other models, achieving mean accuracies of 77.81% for male speakers, 63.11% for female speakers, and 70.53% for combined genders. We conducted a comparative analysis against related works, including the Mel frequency cepstral coefficient (MFCC), MFCC-glottal features, and features extracted using the wav2vec and HuBERT models with SVM classifier. Results demonstrate that our hybrid approach consistently outperforms these models, especially in multi-class classification tasks. The results show the feasibility of a hybrid framework for voice disorder classification, offering a foundation for refining automated tools that could support clinical assessments with further validation.

Authors

Mehtab Ur Rahman

Department of Language and Communication, Radboud University, Houtlaan, Nijmegen, Gelderland, 6525, Netherlands. mehtab.rahman@ru.nl.
Cem Direkoglu

Electrical and Electronics Engineering Department, Middle East Technical University, Northern Cyprus Campus, Kalkanli, Güzelyurt, Mersin 10, 99738, Turkey.

Keywords

Deep Learning Female Humans Male Support Vector Machine Voice Disorders

External Resources

View on PubMed Access via DOI PubMed (40312383)

A hybrid approach for binary and multi-class classification of voice disorders using a pre-trained model and ensemble classifiers.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals