Protein remote homology detection and structural alignment using deep learning.

Journal: Nature biotechnology
Published Date:

Abstract

Exploiting sequence-structure-function relationships in biotechnology requires improved methods for aligning proteins that have low sequence similarity to previously annotated proteins. We develop two deep learning methods to address this gap, TM-Vec and DeepBLAST. TM-Vec allows searching for structure-structure similarities in large sequence databases. It is trained to accurately predict TM-scores as a metric of structural similarity directly from sequence pairs without the need for intermediate computation or solution of structures. Once structurally similar proteins have been identified, DeepBLAST can structurally align proteins using only sequence information by identifying structurally homologous regions between proteins. It outperforms traditional sequence alignment methods and performs similarly to structure-based alignment methods. We show the merits of TM-Vec and DeepBLAST on a variety of datasets, including better identification of remotely homologous proteins compared with state-of-the-art sequence alignment and structure prediction methods.

Authors

  • Tymor Hamamsy
    Center for Data Science, New York University, New York, NY, USA.
  • James T Morton
    Center for Computational Biology, Flatiron Institute, New York, New York.
  • Robert Blackwell
    Scientific Computing Core, Flatiron Institute, Simons Foundation, New York, NY, USA.
  • Daniel Berenberg
    Center for Computational Biology, Flatiron Institute, New York, NY, USA.
  • Nicholas Carriero
    Scientific Computing Core, Flatiron Institute, Simons Foundation, New York, NY, USA.
  • Vladimir Gligorijević
    Center for Computational Biology, Flatiron Institute, New York, NY, USA. vgligorijevic@flatironinstitute.org.
  • Charlie E M Strauss
    Bioscience Division, Los Alamos National Laboratory, Los Alamos, NM, USA.
  • Julia Koehler Leman
    Center for Computational Biology, Flatiron Institute, New York, NY, USA.
  • Kyunghyun Cho
    Department of Information and Computer Science, Aalto University School of Science, Finland.
  • Richard Bonneau
    Department of Biology, Center for Genomics & Systems Biology, New York University, New York, NY 10003, Perinatology Research Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, Bethesda, MD and Detroit, MI 48201, USA, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, Computer Science Department, Courant institute of Mathematical Sciences, New York University, New York, NY 10012 and Department of Computer Science, Wayne State University, Detroit, MI 48202, USA Department of Biology, Center for Genomics & Systems Biology, New York University, New York, NY 10003, Perinatology Research Branch, Eunice Kennedy Shriver National Institute of Child Health and Human Development, NIH, Bethesda, MD and Detroit, MI 48201, USA, IBM Thomas J. Watson Research Center, Yorktown Heights, NY 10598, Computer Science Department, Courant institute of Mathematical Sciences, New York University, New York, NY 10012 and Department of Computer Science, Wayne State University, Detroit, MI 48202, USA.