A Multivariate Poisson Deep Learning Model for Genomic Prediction of Count Data.

Journal: G3 (Bethesda, Md.)
Published Date:

Abstract

The paradigm called genomic selection (GS) is a revolutionary way of developing new plants and animals. This is a predictive methodology, since it uses learning methods to perform its task. Unfortunately, there is no universal model that can be used for all types of predictions; for this reason, specific methodologies are required for each type of output (response variables). Since there is a lack of efficient methodologies for multivariate count data outcomes, in this paper, a multivariate Poisson deep neural network (MPDN) model is proposed for the genomic prediction of various count outcomes simultaneously. The MPDN model uses the minus log-likelihood of a Poisson distribution as a loss function, in hidden layers for capturing nonlinear patterns using the rectified linear unit (RELU) activation function and, in the output layer, the exponential activation function was used for producing outputs on the same scale of counts. The proposed MPDN model was compared to conventional generalized Poisson regression models and univariate Poisson deep learning models in two experimental data sets of count data. We found that the proposed MPDL outperformed univariate Poisson deep neural network models, but did not outperform, in terms of prediction, the univariate generalized Poisson regression models. All deep learning models were implemented in Tensorflow as back-end and Keras as front-end, which allows implementing these models on moderate and large data sets, which is a significant advantage over previous GS models for multivariate count data.

Authors

  • Osval Antonio Montesinos-López
    Facultad de Telemática, Universidad de Colima, Colima, 28040, México.
  • José Cricelio Montesinos-López
    Departamento de Estadística, Centro de Investigación en Matemáticas (CIMAT), Guanajuato, 36023, México.
  • Pawan Singh
    Biometrics and Statistics Unit, Genetic Resources Program, International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera Mexico-Veracruz, CP 52640, Mexico.
  • Nerida Lozano-Ramirez
    Biometrics and Statistics Unit, Genetic Resources Program, International Maize and Wheat Improvement Center (CIMMYT), Km 45 Carretera Mexico-Veracruz, CP 52640, Mexico.
  • Alberto Barrón-López
    Department of Animal Production (DPA), Universidad Nacional Agraria La Molina, Av. La Molina s/n La Molina, 15024, Lima, Perú.
  • Abelardo Montesinos-López
    Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430, Guadalajara, Jalisco, México.
  • José Crossa
    Biometrics and Statistics Unit (BSU), International Maize and Wheat Improvement Center (CIMMYT), Apdo Postal 6-641, México DF, 06600 24105, México. j.crossa@cgiar.org.