Accurate and fast clade assignment via deep learning and frequency chaos game representation.

Journal: GigaScience
Published Date:

Abstract

BACKGROUND: Since the beginning of the coronavirus disease 2019 pandemic, there has been an explosion of sequencing of the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) virus, making it the most widely sequenced virus in the history. Several databases and tools have been created to keep track of genome sequences and variants of the virus; most notably, the GISAID platform hosts millions of complete genome sequences, and it is continuously expanding every day. A challenging task is the development of fast and accurate tools that are able to distinguish between the different SARS-CoV-2 variants and assign them to a clade.

Authors

  • Jorge Avila Cartes
    Department of Computer Science, Systems and Communications, University of Milano-Bicocca, Milan 20125, Italy.
  • Santosh Anand
    Department of Computer Science, Systems and Communications, University of Milano-Bicocca, Milan 20125, Italy.
  • Simone Ciccolella
    Department of Computer Science, Systems and Communications, University of Milano-Bicocca, Milan 20125, Italy.
  • Paola Bonizzoni
    Department of Computer Science, Systems and Communications, University of Milano-Bicocca, Milan 20125, Italy.
  • Gianluca Della Vedova
    Department of Computer Science, Systems and Communications, University of Milano-Bicocca, Milan 20125, Italy.