Genomic benchmarks: a collection of datasets for genomic sequence classification.

Journal: BMC genomic data
PMID:

Abstract

BACKGROUND: Recently, deep neural networks have been successfully applied in many biological fields. In 2020, a deep learning model AlphaFold won the protein folding competition with predicted structures within the error tolerance of experimental methods. However, this solution to the most prominent bioinformatic challenge of the past 50 years has been possible only thanks to a carefully curated benchmark of experimentally predicted protein structures. In Genomics, we have similar challenges (annotation of genomes and identification of functional elements) but currently, we lack benchmarks similar to protein folding competition.

Authors

  • Katarína Grešová
    Central European Institute of Technology (CEITEC), Masaryk University, 60177 Brno, Czech Republic.
  • Vlastimil Martinek
    Centre for Molecular Medicine, Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia.
  • David Čechák
    Centre for Molecular Medicine, Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia.
  • Petr Šimeček
    Centre for Molecular Medicine, Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia. petr.simecek@ceitec.muni.cz.
  • Panagiotis Alexiou
    Central European Institute of Technology (CEITEC), Masaryk University, Brno, Czechia. panagiotis.alexiou@ceitec.muni.cz.