Comprehensive benchmark and architectural analysis of deep learning models for nanopore sequencing basecalling.

Journal: Genome biology
Published Date:

Abstract

BACKGROUND: Nanopore-based DNA sequencing relies on basecalling the electric current signal. Basecalling requires neural networks to achieve competitive accuracies. To improve sequencing accuracy further, new models are continuously proposed with new architectures. However, benchmarking is currently not standardized, and evaluation metrics and datasets used are defined on a per publication basis, impeding progress in the field. This makes it impossible to distinguish data from model driven improvements.

Authors

  • Marc Pagès-Gallego
    Center for Molecular Medicine, University Medical Center Utrecht, Universiteitsweg 100, 3584 CG, Utrecht, The Netherlands.
  • Jeroen de Ridder
    Center for Molecular Medicine, University Medical Center Utrecht, Utrecht, The Netherlands.