Predicting Host Association for Shiga Toxin-Producing E. coli Serogroups by Machine Learning.

Journal: Methods in molecular biology (Clifton, N.J.)
Published Date:

Abstract

Escherichia coli is a species of bacteria that can be present in a wide variety of mammalian hosts and potentially soil environments. E. coli has an open genome and can show considerable diversity in gene content between isolates. It is a reasonable assumption that gene content reflects evolution of strains in particular host environments and therefore can be used to predict the host most likely to be the source of an isolate. An extrapolation of this argument is that strains may also have gene content that favors success in multiple hosts and so the possibility of successful transmission from one host to another, for example, from cattle to human, can also be predicted based on gene content. In this methods chapter, we consider the issue of Shiga toxin (Stx)-producing E. coli (STEC) strains that are present in ruminants as the main host reservoir and for which we know that a subset causes life-threatening infections in humans. We show how the genome sequences of E. coli isolated from both cattle and humans can be used to build a classifier to predict human and cattle host association and how this can be applied to score key STEC serotypes known to be associated with human infection. With the example dataset used, serogroups O157, O26, and O111 show the highest, and O103 and O145 the lowest, predictions for human association. The long-term ambition is to combine such machine learning predictions with phylogeny to predict the zoonotic threat of an isolate based on its whole genome sequence (WGS).

Authors

  • Nadejda Lupolova
    Division of Immunity and Infection, The Roslin Institute and The Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian EH25 9RG, United Kingdom.
  • Antonia Chalka
    Division of Infection and Immunity, The Roslin Institute and Royal (Dick) School of Veterinary Studies, University of Edinburgh, Edinburgh, UK.
  • David L Gally
    Division of Immunity and Infection, The Roslin Institute and The Royal (Dick) School of Veterinary Studies, The University of Edinburgh, Midlothian EH25 9RG, United Kingdom; dgally@ed.ac.uk.