Evolutionarily informed machine learning enhances the power of predictive gene-to-phenotype relationships.

Journal: Nature communications
Published Date:

Abstract

Inferring phenotypic outcomes from genomic features is both a promise and challenge for systems biology. Using gene expression data to predict phenotypic outcomes, and functionally validating the genes with predictive powers are two challenges we address in this study. We applied an evolutionarily informed machine learning approach to predict phenotypes based on transcriptome responses shared both within and across species. Specifically, we exploited the phenotypic diversity in nitrogen use efficiency and evolutionarily conserved transcriptome responses to nitrogen treatments across Arabidopsis accessions and maize varieties. We demonstrate that using evolutionarily conserved nitrogen responsive genes is a biologically principled approach to reduce the feature dimensionality in machine learning that ultimately improved the predictive power of our gene-to-trait models. Further, we functionally validated seven candidate transcription factors with predictive power for NUE outcomes in Arabidopsis and one in maize. Moreover, application of our evolutionarily informed pipeline to other species including rice and mice models underscores its potential to uncover genes affecting any physiological or clinical traits of interest across biology, agriculture, or medicine.

Authors

  • Chia-Yi Cheng
    Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA.
  • Ying Li
    School of Information Engineering, Chang'an University, Xi'an 710010, China.
  • Kranthi Varala
    Department of Horticulture and Landscape Architecture, Purdue University, West Lafayette, IN, USA.
  • Jessica Bubert
    Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
  • Ji Huang
    Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA.
  • Grace J Kim
    Department of Rehabilitation Medicine, NewYork-Presbyterian/Weill Cornell Medical Center, New York, NY. Electronic address: grk9006@nyp.org.
  • Justin Halim
    Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA.
  • Jennifer Arp
    Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
  • Hung-Jui S Shih
    Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA.
  • Grace Levinson
    Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA.
  • Seo Hyun Park
    Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA.
  • Ha Young Cho
    Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA.
  • Stephen P Moose
    Department of Crop Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA.
  • Gloria M Coruzzi
    Department of Biology, Center for Genomics and Systems Biology, New York University, New York, NY, 10003, USA. gloria.coruzzi@nyu.edu.