Performance improvement of machine learning techniques predicting the association of exacerbation of peak expiratory flow ratio with short term exposure level to indoor air quality using adult asthmatics clustered data.

Journal: PloS one
PMID:

Abstract

Large-scale data sources, remote sensing technologies, and superior computing power have tremendously benefitted to environmental health study. Recently, various machine-learning algorithms were introduced to provide mechanistic insights about the heterogeneity of clustered data pertaining to the symptoms of each asthma patient and potential environmental risk factors. However, there is limited information on the performance of these machine learning tools. In this study, we compared the performance of ten machine-learning techniques. Using an advanced method of imbalanced sampling (IS), we improved the performance of nine conventional machine learning techniques predicting the association between exposure level to indoor air quality and change in patients' peak expiratory flow rate (PEFR). We then proposed a deep learning method of transfer learning (TL) for further improvement in prediction accuracy. Our selected final prediction techniques (TL1_IS or TL2-IS) achieved a balanced accuracy median (interquartile range) of 66(56~76) % for TL1_IS and 68(63~78) % for TL2_IS. Precision levels for TL1_IS and TL2_IS were 68(62~72) % and 66(62~69) % while sensitivity levels were 58(50~67) % and 59(51~80) % from 25 patients which were approximately 1.08 (accuracy, precision) to 1.28 (sensitivity) times increased in terms of performance outcomes, compared to NN_IS. Our results indicate that the transfer machine learning technique with imbalanced sampling is a powerful tool to predict the change in PEFR due to exposure to indoor air including the concentration of particulate matter of 2.5 μm and carbon dioxide. This modeling technique is even applicable with small-sized or imbalanced dataset, which represents a personalized, real-world setting.

Authors

  • Wan D Bae
    Department of Computer Science, Seattle University, Seattle, Washington, United States of America.
  • Sungroul Kim
    Department of ICT Environmental Health System, Graduate School, Soonchunhyang University, Asan 31538, Korea.
  • Choon-Sik Park
    Department of Internal Medicine, Soonchunhyang Bucheon Hospital, Wonmi-gu, Bucheon-si, Gyeonggi-do, South Korea.
  • Shayma Alkobaisi
    College of Information Technology, United Arab Emirates University, Abu Dhabi, UAE.
  • Jongwon Lee
    Center for Single Atom-based Semiconductor Device and Department of Materials Science and Engineering, Pohang University of Science and Technology (POSTECH), Pohang, Republic of Korea.
  • Wonseok Seo
    Department of Computer Science, Seattle University, Seattle, Washington, United States of America.
  • Jong Sook Park
    Department of Internal Medicine, Soonchunhyang Bucheon Hospital, Wonmi-gu, Bucheon-si, Gyeonggi-do, South Korea.
  • Sujung Park
    Department of ICT Environmental Health System, Graduate School, Soonchunhayang University, Asan, South Korea.
  • Sangwoon Lee
    Department of ICT Environmental Health System, Graduate School, Soonchunhayang University, Asan, South Korea.
  • Jong Wook Lee
    Department of Otolaryngology-Head & Neck Surgery, Sunnybrook Health Sciences Center, 2075 Bayview Avenue, Toronto, Ontario, M4N 3M5, Canada.