Prediction and interpretation of antibiotic-resistance genes occurrence at recreational beaches using machine learning models.

Journal: Journal of environmental management
Published Date:

Abstract

Antibiotic-resistant bacteria and antibiotic resistance genes (ARGs) are pollutants of worldwide concern that seriously threaten public health and ecosystems. Machine learning (ML) prediction models have been applied to predict ARGs in beach waters. However, the existing studies were conducted at a single location and had low prediction performance. Moreover, ML models are "black boxes" that do not reveal their predictions' internal nuances and mechanisms. This lack of transparency and trust can result in serious consequences when using these models in high-stakes decisions. In this study, we developed a gradient boosted regression tree based (GBRT) ML model and then described its behavior using six explainable artificial intelligence (XAI) model-agnostic explanation methods. We used hydro-meteorological and qPCR data from the beaches in South Korea and Pakistan and developed ML prediction models for aac (6'-lb-cr), sul1, and tetX with 10-fold time-blocked cross-validation performances of 4.9, 2.06 and 4.4 root mean squared logarithmic error, respectively. We then analyzed the local and global behavior of the developed ML model using four interpretation methods. The developed ML models showed that water temperature, precipitation and tide are the most important predictors for prediction of ARGs at recreational beaches. We show that the model-agnostic interpretation methods not only explain the behavior of the ML model but also provide insights into the behavior of the ML model under new unseen conditions. Moreover, these post-processing techniques can be a debugging tool for ML-based modeling.

Authors

  • Sara Iftikhar
    Department of Electrical Engineering and Computer Sciences, National University of Sciences and Technology (NUST), Islamabad 64000, Pakistan.
  • Asad Mustafa Karim
    Department of Biotechnology, College of Life Sciences, Kyung Hee University, Yongin-si 17104, Republic of Korea.
  • Aoun Murtaza Karim
    Institute of Geology and Geophysics, University of Chinese Academy of Sciences, Beijing, China; Institute of Geology, University of the Punjab, Lahore 54590, Pakistan.
  • Mujahid Aizaz Karim
    Sheikh Zayed Medical College/Hospital, Rahim Yar Khan, Pakistan.
  • Muhammad Aslam
    Department of Chemical Engineering, COMSATS University Islamabad, Lahore Campus, Defense Road, Off Raiwind Road, Lahore, Pakistan.
  • Fazila Rubab
    Department of Electrical and Computer Engineering, COMSATS University Islamabad, Wah Campus, Wah Cantt, 47040, Pakistan.
  • Sumera Kausar Malik
    Department of Bioscience and Biotechnology, The University of Suwon, Hwaseong-si, Gyeonggi-do 18323, Republic of Korea.
  • Jeong Eun Kwon
    Department of Biotechnology, College of Life Sciences, Kyung Hee University, Yongin-si 17104, Republic of Korea.
  • Imran Hussain
    School of Life & Allied Health Science, The Glocal University, Saharanpur, UP, India.
  • Esam I Azhar
    Special Infectious Agents Unit, King Fahd Medical Research Center, King Abdulaziz University, Jeddah 21589, Saudi Arabia; Department of Medical Laboratory Sciences, Faculty of Applied Medical Sciences, King Abdulaziz University, Jeddah 21589, Saudi Arabia.
  • Se Chan Kang
    1 Department of Oriental Medicine Biotechnology, College of Life Sciences, Kyung Hee University, Yongin-si, Kyunggi-do, 17104, Korea.
  • Muhammad Yasir
    College of Oceanography and Space Informatics, China University of Petroleum, Qingdao, China.