Machine Learning Early Detection of SARS-CoV-2 High-Risk Variants.

Journal: Advanced science (Weinheim, Baden-Wurttemberg, Germany)
PMID:

Abstract

The severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) has evolved many high-risk variants, resulting in repeated COVID-19 waves overĀ the past years. Therefore, accurate early warning of high-risk variants is vital for epidemic prevention and control. However, detecting high-risk variants through experimental and epidemiological research is time-consuming and often lags behind the emergence and spread of these variants. In this study, HiRisk-Detector a machine learning algorithm based on haplotype network, is developed for computationally early detecting high-risk SARS-CoV-2 variants. Leveraging over 7.6 million high-quality and complete SARS-CoV-2 genomes and metadata, the effectiveness, robustness, and generalizability of HiRisk-Detector are validated. First, HiRisk-Detector is evaluated on actual empirical data, successfully detecting all 13 high-risk variants, preceding World Health Organization announcements by 27 days on average. Second, its robustness is tested by reducing sequencing intensity to one-fourth, noting only a minimal delay of 3.8 days, demonstrating its effectiveness. Third, HiRisk-Detector is applied to detect risks among SARS-CoV-2 Omicron variant sub-lineages, confirming its broad applicability and high ROC-AUC and PR-AUC performance. Overall, HiRisk-Detector features powerful capacity for early detection of high-risk variants, bearing great utility for any public emergency caused by infectious diseases or viruses.

Authors

  • Lun Li
    College of Big Data and Information Engineering, Guizhou University, Guizhou Provincial Characteristic Key Laboratory of System Optimization and Scientific Computing, Guiyang, Guizhou 550025, PR China.
  • Cuiping Li
  • Na Li
    School of Nursing, Fujian University of Traditional Chinese Medicine, Fuzhou, China.
  • Dong Zou
    BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
  • Wenming Zhao
    BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
  • Hong Luo
    SAMR Key Laboratory of Human Factors and Ergonomics, China National Institute of Standardization, Beijing, 100191, China.
  • Yongbiao Xue
    China National Center for Bioinformation, Beijing, 100101, China.
  • Zhang Zhang
    c BIG Data Center, Beijing Institute of Genomics (BIG) , Chinese Academy of Sciences , Beijing , China.
  • Yiming Bao
    BIG Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences, Beijing 100101, China.
  • Shuhui Song
    National Genomics Data Center, Beijing Institute of Genomics, Chinese Academy of Sciences and China National Center for Bioinformation, BeijingĀ 100101, China.