Harnessing machine learning to guide phylogenetic-tree search algorithms.

Journal: Nature communications
Published Date:

Abstract

Inferring a phylogenetic tree is a fundamental challenge in evolutionary studies. Current paradigms for phylogenetic tree reconstruction rely on performing costly likelihood optimizations. With the aim of making tree inference feasible for problems involving more than a handful of sequences, inference under the maximum-likelihood paradigm integrates heuristic approaches to evaluate only a subset of all potential trees. Consequently, existing methods suffer from the known tradeoff between accuracy and running time. In this proof-of-concept study, we train a machine-learning algorithm over an extensive cohort of empirical data to predict the neighboring trees that increase the likelihood, without actually computing their likelihood. This provides means to safely discard a large set of the search space, thus potentially accelerating heuristic tree searches without losing accuracy. Our analyses suggest that machine learning can guide tree-search methodologies towards the most promising candidate trees.

Authors

  • Dana Azouri
    School of Plant Sciences and Food Security, Tel Aviv University, Ramat Aviv, Tel-Aviv, Israel.
  • Shiran Abadi
    Department of Molecular Biology and Ecology of Plants, Tel Aviv University, Tel Aviv, Israel.
  • Yishay Mansour
    Balvatnik School of Computer Science, Tel-Aviv University, Ramat Aviv, Tel-Aviv, Israel.
  • Itay Mayrose
    Department of Molecular Biology and Ecology of Plants, Tel Aviv University, Tel Aviv, Israel.
  • Tal Pupko
    Department of Earth and Planetary Science, UC Berkeley, Berkeley, CA, 94720, USA.