BetaAlign: a deep learning approach for multiple sequence alignment.

Journal: Bioinformatics (Oxford, England)
PMID:

Abstract

MOTIVATION: Multiple sequence alignments (MSAs) are extensively used in biology, from phylogenetic reconstruction to structure and function prediction. Here, we suggest an out-of-the-box approach for the inference of MSAs, which relies on algorithms developed for processing natural languages. We show that our artificial intelligence (AI)-based methodology can be trained to align sequences by processing alignments that are generated via simulations, and thus different aligners can be easily generated for datasets with specific evolutionary dynamics attributes. We expect that natural language processing (NLP) solutions will replace or augment classic solutions for computing alignments, and more generally, challenging inference tasks in phylogenomics.

Authors

  • Edo Dotan
    The Henry and Marilyn Taub Faculty of Computer Science, Technion - Israel Institute of Technology, Haifa 3200003, Israel.
  • Elya Wygoda
    The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
  • Noa Ecker
    The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
  • Michael Alburquerque
    The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv 69978, Israel.
  • Oren Avram
    The Shmunis School of Biomedicine and Cancer Research, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel.
  • Yonatan Belinkov
    The Henry and Marilyn Taub Faculty of Computer Science, Technion - Israel Institute of Technology, Haifa 3200003, Israel.
  • Tal Pupko
    Department of Earth and Planetary Science, UC Berkeley, Berkeley, CA, 94720, USA.