Using minor variant genomes and machine learning to study the genome biology of SARS-CoV-2 over time.

Journal: Nucleic acids research
PMID:

Abstract

In infected individuals, viruses are present as a population consisting of dominant and minor variant genomes. Most databases contain information on the dominant genome sequence. Since the emergence of SARS-CoV-2 in late 2019, variants have been selected that are more transmissible and capable of partial immune escape. Currently, models for projecting the evolution of SARS-CoV-2 are based on using dominant genome sequences to forecast whether a known mutation will be prevalent in the future. However, novel variants of SARS-CoV-2 (and other viruses) are driven by evolutionary pressure acting on minor variant genomes, which then become dominant and form a potential next wave of infection. In this study, sequencing data from 96 209 patients, sampled over a 3-year period, were used to analyse patterns of minor variant genomes. These data were used to develop unsupervised machine learning clusters to identify amino acids that had a greater potential for mutation than others in the Spike protein. Being able to identify amino acids that may be present in future variants would better inform the design of longer-lived medical countermeasures and allow a risk-based evaluation of viral properties, including assessment of transmissibility and immune escape, thus providing candidates with early warning signals for when a new variant of SARS-CoV-2 emerges.

Authors

  • Xiaofeng Dong
    Institute of Infection, Veterinary and Ecological Sciences, Faculty of Health and Life Sciences, University of Liverpool, Liverpool, L3 5RF, United Kingdom.
  • David A Matthews
    School of Cellular and Molecular Medicine, University of Bristol, Bristol, BS8 1TD, United Kingdom.
  • Giulia Gallo
    The Pirbright Institute, Pirbright, Woking, GU24 0NF, United Kingdom.
  • Alistair Darby
    Institute of Infection, Veterinary and Ecological Sciences, Faculty of Health and Life Sciences, University of Liverpool, Liverpool, L3 5RF, United Kingdom.
  • I'ah Donovan-Banfield
    Institute of Infection, Veterinary and Ecological Sciences, Faculty of Health and Life Sciences, University of Liverpool, Liverpool, L3 5RF, United Kingdom.
  • Hannah Goldswain
    Institute of Infection, Veterinary and Ecological Sciences, Faculty of Health and Life Sciences, University of Liverpool, Liverpool, L3 5RF, United Kingdom.
  • Tracy MacGill
    Office of Counterterrorism and Emerging Threats, U.S. Food and Drug Administration, Silver Spring, MD 20993-0002, United States.
  • Todd Myers
    Office of Counterterrorism and Emerging Threats, U.S. Food and Drug Administration, Silver Spring, MD 20993-0002, United States.
  • Robert Orr
    Office of Counterterrorism and Emerging Threats, U.S. Food and Drug Administration, Silver Spring, MD 20993-0002, United States.
  • Dalan Bailey
    The Pirbright Institute, Pirbright, Woking, GU24 0NF, United Kingdom.
  • Miles W Carroll
    NIHR Health Protection Research Unit in Emerging and Zoonotic Infections, L69 7BE, Liverpool, United Kingdom.
  • Julian A Hiscox
    Institute of Infection, Veterinary and Ecological Sciences, Faculty of Health and Life Sciences, University of Liverpool, Liverpool, L3 5RF, United Kingdom.