A deep convolutional neural network approach for predicting phenotypes from genotypes.

Journal: Planta
Published Date:

Abstract

Deep learning is a promising technology to accurately select individuals with high phenotypic values based on genotypic data. Genomic selection (GS) is a promising breeding strategy by which the phenotypes of plant individuals are usually predicted based on genome-wide markers of genotypes. In this study, we present a deep learning method, named DeepGS, to predict phenotypes from genotypes. Using a deep convolutional neural network, DeepGS uses hidden variables that jointly represent features in genotypes when making predictions; it also employs convolution, sampling and dropout strategies to reduce the complexity of high-dimensional genotypic data. We used a large GS dataset to train DeepGS and compared its performance with other methods. The experimental results indicate that DeepGS can be used as a complement to the commonly used RR-BLUP in the prediction of phenotypes from genotypes. The complementarity between DeepGS and RR-BLUP can be utilized using an ensemble learning approach for more accurately selecting individuals with high phenotypic values, even for the absence of outlier individuals and subsets of genotypic markers. The source codes of DeepGS and the ensemble learning approach have been packaged into Docker images for facilitating their applications in different GS programs.

Authors

  • Wenlong Ma
    College of Food Science and Engineering, Yangzhou University, Yangzhou 225127, China.
  • Zhixu Qiu
    State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, 712100, Shaanxi, China.
  • Jie Song
    School of Chemistry, Dalian University of Technology, Dalian 116023, PR China.
  • Jiajia Li
    Shanghai Artificial Intelligence Research Institute Co., Ltd, Shanghai, China.
  • Qian Cheng
    Medical Image Processing, Analysis, and Visualization (MIVAP) Lab, School of Electronics and Information Engineering, Soochow University, Suzhou, China.
  • Jingjing Zhai
    State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, 712100, Shaanxi, China.
  • Chuang Ma
    State Key Laboratory of Crop Stress Biology for Arid Areas, Center of Bioinformatics, College of Life Sciences, Northwest A&F University, Yangling, 712100, Shaanxi, China. cma@nwafu.edu.cn.