Perceptual Implications of Automatic Anonymization in Pathological Speech
Journal:
arXiv
Published Date:
May 1, 2025
Abstract
Automatic anonymization techniques are essential for ethical sharing of
pathological speech data, yet their perceptual consequences remain
understudied. This study presents the first comprehensive human-centered
analysis of anonymized pathological speech, using a structured perceptual
protocol involving ten native and non-native German listeners with diverse
linguistic, clinical, and technical backgrounds. Listeners evaluated
anonymized-original utterance pairs from 180 speakers spanning Cleft Lip and
Palate, Dysarthria, Dysglossia, Dysphonia, and age-matched healthy controls.
Speech was anonymized using state-of-the-art automatic methods (equal error
rates in the range of 30-40%). Listeners completed Turing-style discrimination
and quality rating tasks under zero-shot (single-exposure) and few-shot
(repeated-exposure) conditions. Discrimination accuracy was high overall (91%
zero-shot; 93% few-shot), but varied by disorder (repeated-measures ANOVA:
p=0.007), ranging from 96% (Dysarthria) to 86% (Dysphonia). Anonymization
consistently reduced perceived quality (from 83% to 59%, p<0.001), with
pathology-specific degradation patterns (one-way ANOVA: p=0.005). Native
listeners rated original speech slightly higher than non-native listeners
(Delta=4%, p=0.199), but this difference nearly disappeared after anonymization
(Delta=1%, p=0.724). No significant gender-based bias was observed. Critically,
human perceptual outcomes did not correlate with automatic privacy or clinical
utility metrics. These results underscore the need for listener-informed,
disorder- and context-specific anonymization strategies that preserve privacy
while maintaining interpretability, communicative functions, and diagnostic
utility, especially for vulnerable populations such as children.