Inpactor2: a software based on deep learning to identify and classify LTR-retrotransposons in plant genomes.

Journal: Briefings in bioinformatics
PMID:

Abstract

LTR-retrotransposons are the most abundant repeat sequences in plant genomes and play an important role in evolution and biodiversity. Their characterization is of great importance to understand their dynamics. However, the identification and classification of these elements remains a challenge today. Moreover, current software can be relatively slow (from hours to days), sometimes involve a lot of manual work and do not reach satisfactory levels in terms of precision and sensitivity. Here we present Inpactor2, an accurate and fast application that creates LTR-retrotransposon reference libraries in a very short time. Inpactor2 takes an assembled genome as input and follows a hybrid approach (deep learning and structure-based) to detect elements, filter partial sequences and finally classify intact sequences into superfamilies and, as very few tools do, into lineages. This tool takes advantage of multi-core and GPU architectures to decrease execution times. Using the rice genome, Inpactor2 showed a run time of 5 minutes (faster than other tools) and has the best accuracy and F1-Score of the tools tested here, also having the second best accuracy and specificity only surpassed by EDTA, but achieving 28% higher sensitivity. For large genomes, Inpactor2 is up to seven times faster than other available bioinformatics tools.

Authors

  • Simon Orozco-Arias
    Department of Computer Science, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia.
  • Luis Humberto Lopez-Murillo
    Department of Computer Science, Universidad Autónoma de Manizales, 170001, Caldas, Colombia.
  • Mariana S Candamil-Cortés
    Department of Computer Science, Universidad Autónoma de Manizales, 170001, Caldas, Colombia.
  • Maradey Arias
    Department of Computer Science, Universidad Autónoma de Manizales, 170001, Caldas, Colombia.
  • Paula A Jaimes
    Department of Computer Science, Universidad Autónoma de Manizales, 170001, Caldas, Colombia.
  • Alexandre Rossi Paschoal
    Bioinformatics and Pattern Recognition Group, Department of Computer Science, Federal University of Technology (UTFPR) - Paraná, 80230-901, Paraná, Brazil.
  • Reinel Tabares-Soto
    Department of Electronics and Automation, Universidad Autónoma de Manizales, Manizales, Caldas, Colombia.
  • Gustavo Isaza
    Department of Systems and Informatics, Center for Technology Development - Bioprocess and Agro-industry Plant, Universidad de Caldas, 170004, Caldas, Colombia.
  • Romain Guyot
    Department of Electronics and Automation, Universidad Autónoma de Manizales, 170001, Caldas, Colombia.