A mapping-free natural language processing-based technique for sequence search in nanopore long-reads.

Journal: BMC bioinformatics
PMID:

Abstract

BACKGROUND: In unforeseen situations, such as nuclear power plant's or civilian radiation accidents, there is a need for effective and computationally inexpensive methods to determine the expression level of a selected gene panel, allowing for rough dose estimates in thousands of donors. The new generation in-situ mapper, fast and of low energy consumption, working at the level of single nanopore output, is in demand. We aim to create a sequence identification tool that utilizes natural language processing techniques and ensures a high level of negative predictive value (NPV) compared to the classical approach.

Authors

  • Tomasz Strzoda
    Department of Data Science and Engineering, Silesian University of Technology, Gliwice, Poland.
  • Lourdes Cruz-Garcia
    Cancer Mechanisms and Biomarkers Group, Centre for Radiation, Chemical and Environmental Hazards, UK Health Security Agency, Oxfordshire, OX11 0RQ, United Kingdom.
  • Mustafa Najim
    Cancer Mechanisms and Biomarkers Group, Centre for Radiation, Chemical and Environmental Hazards, UK Health Security Agency, Oxfordshire, OX11 0RQ, United Kingdom.
  • Christophe Badie
    Cancer Mechanisms and Biomarkers Group, Centre for Radiation, Chemical and Environmental Hazards, UK Health Security Agency, Oxfordshire, OX11 0RQ, United Kingdom.
  • Joanna Polanska
    Department of Data Science and Engineering, Silesian University of Technology, 44-100 Gliwice, Poland.