Exploring classical machine learning for identification of pathological lung auscultations.

Journal: Computers in biology and medicine
Published Date:

Abstract

The use of machine learning in biomedical research has surged in recent years thanks to advances in devices and artificial intelligence. Our aim is to expand this body of knowledge by applying machine learning to pulmonary auscultation signals. Despite improvements in digital stethoscopes and attempts to find synergy between them and artificial intelligence, solutions for their use in clinical settings remain scarce. Physicians continue to infer initial diagnoses with less sophisticated means, resulting in low accuracy, leading to suboptimal patient care. To arrive at a correct preliminary diagnosis, the auscultation diagnostics need to be of high accuracy. Due to the large number of auscultations performed, data availability opens up opportunities for more effective sound analysis. In this study, digital 6-channel auscultations of 45 patients were used in various machine learning scenarios, with the aim of distinguishing between normal and abnormal pulmonary sounds. Audio features (such as fundamental frequencies F0-4, loudness, HNR, DFA, as well as descriptive statistics of log energy, RMS and MFCC) were extracted using the Python library Surfboard. Windowing, feature aggregation, and concatenation strategies were used to prepare data for machine learning algorithms in unsupervised (fair-cut forest, outlier forest) and supervised (random forest, regularized logistic regression) settings. The evaluation was carried out using 9-fold stratified cross-validation repeated 30 times. Decision fusion by averaging the outputs for a subject was also tested and found to be helpful. Supervised models showed a consistent advantage over unsupervised ones, with random forest achieving a mean AUC ROC of 0.691 (accuracy 71.11%, Kappa 0.416, F1-score 0.675) in side-based detection and a mean AUC ROC of 0.721 (accuracy 68.89%, Kappa 0.371, F1-score 0.650) in patient-based detection.

Authors

  • Haroldas Razvadauskas
    Lithuanian University of Health Sciences, Kaunas, Lithuania.
  • Evaldas Vaičiukynas
    Faculty of Informatics, Kaunas University of Technology, Studentu 50, LT-51368 Kaunas, Lithuania.
  • Kazimieras Buškus
    Faculty of Mathematics and Natural Sciences, Kaunas University of Technology, Studentu 50, LT-51368 Kaunas, Lithuania.
  • Lukas Arlauskas
    Kaunas University of Technology, Kaunas, Lithuania.
  • Sławomir Nowaczyk
    Center for Applied Intelligent Systems Research, Halmstad University, Sweden.
  • Saulius Sadauskas
    Lithuanian University of Health Sciences, Kaunas, Lithuania.
  • Albinas Naudžiūnas
    Lithuanian University of Health Sciences, Kaunas, Lithuania.