Using Machine Learning To Predict Antimicrobial MICs and Associated Genomic Features for Nontyphoidal .

Journal: Journal of clinical microbiology
PMID:

Abstract

Nontyphoidal species are the leading bacterial cause of foodborne disease in the United States. Whole-genome sequences and paired antimicrobial susceptibility data are available for strains because of surveillance efforts from public health agencies. In this study, a collection of 5,278 nontyphoidal genomes, collected over 15 years in the United States, was used to generate extreme gradient boosting (XGBoost)-based machine learning models for predicting MICs for 15 antibiotics. The MIC prediction models had an overall average accuracy of 95% within ±1 2-fold dilution step (confidence interval, 95% to 95%), an average very major error rate of 2.7% (confidence interval, 2.4% to 3.0%), and an average major error rate of 0.1% (confidence interval, 0.1% to 0.2%). The model predicted MICs with no information about the underlying gene content or resistance phenotypes of the strains. By selecting diverse genomes for the training sets, we show that highly accurate MIC prediction models can be generated with less than 500 genomes. We also show that our approach for predicting MICs is stable over time, despite annual fluctuations in antimicrobial resistance gene content in the sampled genomes. Finally, using feature selection, we explore the important genomic regions identified by the models for predicting MICs. To date, this is one of the largest MIC modeling studies to be published. Our strategy for developing whole-genome sequence-based models for surveillance and clinical diagnostics can be readily applied to other important human pathogens.

Authors

  • Marcus Nguyen
    University of Chicago Consortium for Advanced Science and Engineering, University of Chicago, Chicago, Illinois, USA.
  • S Wesley Long
    Center for Molecular and Translational Human Infectious Diseases Research, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute and Houston Methodist Hospital, Houston, Texas, USA.
  • Patrick F McDermott
    U.S. Food and Drug Administration, Center for Veterinary Medicine, Office of Research, Laurel, Maryland, USA.
  • Randall J Olsen
    Center for Molecular and Translational Human Infectious Diseases Research, Department of Pathology and Genomic Medicine, Houston Methodist Research Institute and Houston Methodist Hospital, Houston, Texas, USA.
  • Robert Olson
    Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, Illinois, USA.
  • Rick L Stevens
    Computer Science Department and Computation Institute, University of Chicago, Chicago, Illinois, USA.
  • Gregory H Tyson
    U.S. Food and Drug Administration, Center for Veterinary Medicine, Office of Research, Laurel, Maryland, USA.
  • Shaohua Zhao
    U.S. Food and Drug Administration, Center for Veterinary Medicine, Office of Research, Laurel, Maryland, USA.
  • James J Davis
    Computing, Environment, and Life Sciences Directorate, Argonne National Laboratory, Argonne, Illinois, USA.