An explainable and accurate transformer-based deep learning model for wheeze classification utilizing real-world pediatric data.

Journal: Scientific reports
PMID:

Abstract

Auscultation is a method that involves listening to sounds from the patient's body, mainly using a stethoscope, to diagnose diseases. The stethoscope allows for non-invasive, real-time diagnosis, and it is ideal for diagnosing respiratory diseases and first aid. However, accurate interpretation of respiratory sounds using a stethoscope is a subjective process that requires considerable expertise from clinicians. To overcome the shortcomings of existing stethoscopes, research is actively being conducted to develop an artificial intelligence deep learning model that can interpret breathing sounds recorded through electronic stethoscopes. Most recent studies in this area have focused on CNN-based respiratory sound classification models. However, such CNN models are limited in their ability to accurately interpret conditions that require longer overall length and more detailed context. Therefore, in the present work, we apply the Transformer model-based Audio Spectrogram Transformer (AST) model to our actual clinical practice data. This prospective study targeted children who visited the pediatric departments of two university hospitals in South Korea from 2019 to 2020. A pediatric pulmonologist recorded breath sounds, and a pediatric breath sound dataset was constructed through double-blind verification. We then developed a deep learning model that applied the pre-trained weights of the AST model to our data with a total of 194 wheezes and 531 other respiratory sounds. We compared the performance of the proposed model with that of a previously published CNN-based model and also conducted performance tests using previous datasets. To ensure the reliability of the proposed model, we visualized the classification process using Score-Class Activation Mapping (Score-CAM). Our model had an accuracy of 91.1%, area under the curve (AUC) of 86.6%, precision of 88.2%, recall of 76.9%, and F1-score of 82.2%. Ultimately, the proposed transformer-based model showed high accuracy in wheezing detection, and the decision-making process of the model was also verified to be reliable. The artificial intelligence deep learning model we have developed and described in this study is expected to help accurately diagnose pediatric respiratory diseases in real-world clinical practice.

Authors

  • Beom Joon Kim
    Departments of Dermatology, Chung-Ang University College of Medicine, Seoul, Korea.
  • Jeong Hyeon Mun
    Department of Applied Statistics, Chung-Ang University, 84 Heukseok-Ro, Dongjak-Gu, Seoul, 06974, Republic of Korea.
  • Dae Hwan Hwang
    Department of Statistics and Data Science, Chung-Ang University, Seoul, Republic of Korea.
  • Dong In Suh
    Department of Pediatrics, Seoul National University College of Medicine, Seoul, Republic of Korea.
  • Changwon Lim
    Department of Applied Statistics, Chung-Ang University, 84 Heukseok-Ro, Dongjak-Gu, Seoul, 06974, Republic of Korea. clim@cau.ac.kr.
  • Kyunghoon Kim
    Department of Pediatrics, College of Medicine, The Catholic University of Korea, Seoul, Korea.