Modeling aspects of the language of life through transfer-learning protein sequences.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: Predicting protein function and structure from sequence is one important challenge for computational biology. For 26 years, most state-of-the-art approaches combined machine learning and evolutionary information. However, for some applications retrieving related proteins is becoming too time-consuming. Additionally, evolutionary information is less powerful for small families, e.g. for proteins from the Dark Proteome. Both these problems are addressed by the new methodology introduced here.

Authors

  • Michael Heinzinger
    Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany. mheinzinger@rostlab.org.
  • Ahmed Elnaggar
    Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.
  • Yu Wang
    Clinical and Technical Support, Philips Healthcare, Shanghai, China.
  • Christian Dallago
    Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.
  • Dmitrii Nechaev
    Department of Informatics, Bioinformatics & Computational Biology - i12, TUM (Technical University of Munich), Boltzmannstr. 3, 85748, Garching/Munich, Germany.
  • Florian Matthes
    TUM Department of Informatics, Software Engineering and Business Information Systems, Boltzmannstr. 1, 85748, Garching/Munich, Germany.
  • Burkhard Rost