Ensemble of Bayesian alphabets via constraint weight optimization strategy improves genomic prediction accuracy.

Journal: G3 (Bethesda, Md.)
Published Date:

Abstract

This study proposes a weight optimization-based ensemble framework aimed at improving genomic prediction accuracy. It incorporates 8 Bayesian models-BayesA, BayesB, BayesC, BayesBpi, BayesCpi, BayesR, BayesL, and BayesRR in the ensemble framework, where the weight assigned to each model was optimized using genetic algorithm method. The performance of the ensemble model, named EnBayes, was evaluated on 18 datasets from 4 crop species, showing improved prediction accuracy compared to individual Bayesian models. New objective functions were proposed to improve prediction accuracy in terms of both Pearson's correlation coefficient and mean square error. The accuracy of the ensemble model was found to be associated with the number of models considered in the framework, where a few more accurate models achieved similar accuracy as that of more number of less accurate models. Additionally, over-bias and under-bias models also influenced the biasness of the ensemble model's accuracy. The study also explored a meta-learning approach using Bayesian models as base learners and random forest, quantile regression forest, and ridge regression as meta-learners, with the EnBayes model outperforming this approach. While traditional genomic prediction models GBLUP and rrBLUP and machine learning models support vector machine, random forest, extreme gradient boosting, and light gradient boosting were included in the ensemble framework in addition to Bayesian models, the ensemble model achieved higher accuracy as compared to the individual Bayesian, BLUP, and machine learning models. We believe that EnBayes would contribute significantly to ongoing efforts on improving genomic prediction accuracy.

Authors

  • Prabina Kumar Meher
    Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi-110012, India.
  • Upendra Kumar Pradhan
    Studio of Computational Biology & Bioinformatics, The Himalayan Centre for High-throughput Computational Biology, (HiCHiCoB, A BIC supported by DBT, India)CSIR-Institute of Himalayan Bioresource Technology (CSIR-IHBT), Palampur (HP), India.
  • Mrinmoy Ray
    Division of Forecasting and Agricultural Systems Modeling, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India.
  • Ajit Gupta
    Division of Statistical Genetics, ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India. Electronic address: ajit@icar.gov.in.
  • Rajender Parsad
    ICAR-Indian Agricultural Statistics Research Institute, PUSA, New Delhi 110012, India. Electronic address: rajender.parsad@icar.gov.in.
  • Pushpendra Kumar Gupta
    Department of Genetics and Plant Breeding, Chaudhary Charan Singh University, Meerut 250004, India.

Keywords

No keywords available for this article.