Deep learning for early detection of Zenker's diverticulum based on swallowing sound analysis.

Journal: International journal of computer assisted radiology and surgery
Published Date:

Abstract

PURPOSE: Patients with Zenker's diverticulum (ZD) often suffer from oropharyngeal dysphagia that can go undiagnosed for years. Diagnosis of ZD typically requires specialized centers and videofluoroscopy. Our study aims to create a noninvasive, accessible, sound-based screening tool for healthcare professionals to reduce diagnostic barriers and enable earlier detection of ZD. METHODS: We developed a two-stage deep learning model to detect ZD using cervical auscultation sounds. The first stage identifies swallowing sounds (idle vs. swallow), and the second classifies detected swallows as healthy or pathological (Healthy vs. ZD). We used transfer learning with a pre-trained audio spectrogram transformer (AST) backbone and fine-tuned it for our task. A fivefold cross-validation protocol was applied to evaluate the model's performance. For data collection, we built a portable cervical auscultation device to gather recordings from 23 ZD patients and 27 healthy volunteers. RESULTS: The proposed method achieved a patient-level ZD diagnosis accuracy of 88.7 ± 7.7 % and an F1-score of 87.6 ± 8.3 % . We report the intermediate results for the individual stage on a snippet level and perform an ablation study to justify our design decisions and benchmark our approach. CONCLUSION: This study demonstrates, to our knowledge, the first deep learning-based cervical auscultation approach for identifying ZD. The results indicate that auscultation-driven AST-based models can provide clinically meaningful sensitivity and may help to lower diagnostic barriers, enable earlier referral, and ultimately reduce healthcare costs in dysphagia care.

Authors

Keywords

No keywords available for this article.