Evaluating the Effectiveness of Pre-Trained Audio Embeddings for Classification of Parkinson's Disease Speech Data
Journal:
arXiv
Published Date:
Jun 2, 2025
Abstract
Speech impairments are prevalent biomarkers for Parkinson's Disease (PD),
motivating the development of diagnostic techniques using speech data for
clinical applications. Although deep acoustic features have shown promise for
PD classification, their effectiveness often varies due to individual speaker
differences, a factor that has not been thoroughly explored in the existing
literature. This study investigates the effectiveness of three pre-trained
audio embeddings (OpenL3, VGGish and Wav2Vec2.0 models) for PD classification.
Using the NeuroVoz dataset, OpenL3 outperforms others in diadochokinesis (DDK)
and listen and repeat (LR) tasks, capturing critical acoustic features for PD
detection. Only Wav2Vec2.0 shows significant gender bias, achieving more
favorable results for male speakers, in DDK tasks. The misclassified cases
reveal challenges with atypical speech patterns, highlighting the need for
improved feature extraction and model robustness in PD detection.