Forecasting dominance of SARS-CoV-2 lineages by anomaly detection using deep AutoEncoders.

Journal: Briefings in bioinformatics
Published Date:

Abstract

The COVID-19 pandemic is marked by the successive emergence of new SARS-CoV-2 variants, lineages, and sublineages that outcompete earlier strains, largely due to factors like increased transmissibility and immune escape. We propose DeepAutoCoV, an unsupervised deep learning anomaly detection system, to predict future dominant lineages (FDLs). We define FDLs as viral (sub)lineages that will constitute >10% of all the viral sequences added to the GISAID, a public database supporting viral genetic sequence sharing, in a given week. DeepAutoCoV is trained and validated by assembling global and country-specific data sets from over 16 million Spike protein sequences sampled over a period of ~4 years. DeepAutoCoV successfully flags FDLs at very low frequencies (0.01%-3%), with median lead times of 4-17 weeks, and predicts FDLs between ~5 and ~25 times better than a baseline approach. For example, the B.1.617.2 vaccine reference strain was flagged as FDL when its frequency was only 0.01%, more than a year before it was considered for an updated COVID-19 vaccine. Furthermore, DeepAutoCoV outputs interpretable results by pinpointing specific mutations potentially linked to increased fitness and may provide significant insights for the optimization of public health 'pre-emptive' intervention strategies.

Authors

  • Simone Rancati
    Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Adolfo Ferrata 5, Pavia, 27100, Italy.
  • Giovanna Nicora
    Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.
  • Mattia Prosperi
    University of Florida, Gainesville, Florida, USA.
  • Riccardo Bellazzi
    Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Pavia, Italy.
  • Marco Salemi
    University of Florida, Department of Pathology and Laboratory Medicine, Gainesville, FL 32610, United States.
  • Simone Marini
    Department of Electrical, Computer and Biomedical Engineering, University of Pavia, Via Ferrata 1, 27100, Pavia, Italy. simone.marini@unipv.it.