Deep Learning-Based Acoustic Screening for Penetration-Aspiration Events Using Short Voice Recordings.
Journal:
Dysphagia
Published Date:
Jun 2, 2026
Abstract
To evaluate the feasibility of a smartphone-based deep learning artificial intelligence (AI) tool for detecting post-swallow airway compromise through brief acoustic analysis of voice recordings obtained before and after swallowing. This multicenter prospective study employed a simple 1.5-second sustained phonation ("a~") recorded on a smartphone in patients referred for videofluoroscopic swallowing studies (VFSS). Cases were classified using the Penetration-Aspiration Scale (PAS), with PAS 1 defined as normal and PAS 2-8 as abnormal (penetration-aspiration events). An autoencoder-based anomaly detection model was trained on normal data (PAS 1) and validated using sensitivity, specificity, accuracy, and the area under the receiver operating characteristic curve (AUC). Among 208 participants, the AI model achieved a sensitivity of 90.9% and specificity of 87.5% in the validation set, with an accuracy of 90.4% and an AUC of 0.98. In the independent test set, sensitivity was 91.9%, specificity 50.0%, accuracy 85.2%, and AUC 0.76. A brief 1.5-second voice recording analyzed with a deep learning AI model showed promising internal performance for screening post-swallow airway compromise. This approach may serve as a practical and accessible adjunct to identify individuals requiring further instrumental assessment.
Authors
Keywords
No keywords available for this article.