Machine learning algorithms translate big data into predictive breeding accuracy.

Journal: Trends in plant science
PMID:

Abstract

Statistical machine learning (ML) extracts patterns from extensive genomic, phenotypic, and environmental data. ML algorithms automatically identify relevant features and use cross-validation to ensure robust models and improve prediction reliability in new lines. Furthermore, ML analyses of genotype-by-environment (G×E) interactions can offer insights into the genetic factors that affect performance in specific environments. By leveraging historical breeding data, ML streamlines strategies and automates analyses to reveal genomic patterns. In this review we examine the transformative impact of big data, including multi-trait genomics, phenomics, and environmental covariables, on genomic-enabled prediction in plant breeding. We discuss how big data and ML are revolutionizing the field by enhancing prediction accuracy, deepening our understanding of G×E interactions, and optimizing breeding strategies through the analysis of extensive and diverse datasets.

Authors

  • José Crossa
    Biometrics and Statistics Unit (BSU), International Maize and Wheat Improvement Center (CIMMYT), Apdo Postal 6-641, México DF, 06600 24105, México. j.crossa@cgiar.org.
  • Osval A Montesinos-López
    Facultad de Telemática oamontes1@ucol.mx j.crossa@cgiar.org.
  • Germano Costa-Neto
    Cornell University Ithaca, New York, NY, USA.
  • Paolo Vitale
    International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km 45, El Batán, Texcoco, CP 56237, Estado de México, Mexico.
  • Johannes W R Martini
    Biometrics and Statistics Unit, International Maize and Wheat Improvement Center (CIMMYT), Km 45, CP 52640, Carretera Mexico-Veracruz, Mexico.
  • Daniel Runcie
    Department of Plant Sciences, University of California Davis, Davis, CA, USA.
  • Roberto Fritsche-Neto
    Louisiana State University, College of Agriculture, Baton Rouge, LA, USA.
  • Abelardo Montesinos-López
    Departamento de Matemáticas, Centro Universitario de Ciencias Exactas e Ingenierías (CUCEI), Universidad de Guadalajara, 44430, Guadalajara, Jalisco, México.
  • Paulino Pérez-Rodríguez
    Colegio de Postgraduados, Campus Montecillo, Texcoco, México, 056230, México. perpdgo@gmail.com.
  • Guillermo Gerard
    International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km 45, El Batán, Texcoco, CP 56237, Estado de México, Mexico.
  • Susanna Dreisigacker
    International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km 45, El Batán, Texcoco, CP 56237, Estado de México, Mexico.
  • Leonardo Crespo-Herrera
    Global Wheat Program, International Maize and Wheat Improvement Center, Texcoco, Estado de Mexico, Mexico.
  • Carolina Saint Pierre
    International Maize and Wheat Improvement Center (CIMMYT), Carretera México- Veracruz Km 45, El Batán, Texcoco, CP 56237, Estado de México, Mexico.
  • Morten Lillemo
    Norwegian University of Life Science (NMBU), Department of Plant Science, Ås, Norway.
  • Jaime Cuevas
    Universidad de Quintana Roo, Chetumal, Quintana Roo, 77019, Mexico.
  • Alison Bentley
    Australian National University, Research School of Biology, Canberra, NSW, Australia. Electronic address: Alison.Bentley@anu.edu.au.
  • Rodomiro Ortiz
    Department of Plant Breeding, Swedish University of Agricultural Sciences (SLU), PO Box 190 Sundsvagen 10, SE 23422 Lomma, Sweden. Electronic address: rodomiro.ortiz@slu.se.