Generative AI for predictive breeding: hopes and caveats.

Journal: TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik
Published Date:

Abstract

Among the broad area of artificial intelligence (AI), generative AI algorithms have emerged as a revolutionary technology able to produce highly realistic 'synthetic' data, akin to standard simulation but with fewer contraints. The main focus of generative AI has been on phenotypes, but here we argue it can serve as well for generating synthetic environments and genotypes. This data-driven technology may be able to overcome some of the limitations that standard simulations have, such as strong assumptions on the underlying genotype to phenotype map. We discuss key features of popular generative models including autoregressive models, generative adversarial networks, variational autoencoders, diffusion and flow-based models. Several of these methods utilize a latent space, often of lower dimensionality than the raw data, that can help making the models interpretable and can be a link between simulation and generative algorithms. Augmenting data as realistically as possible with genAI can improve inference and predictive performance of genomic prediction models, but symbolic simulation will continue to play a fundamental role in predictive breeding. A hybrid tool that implements both approaches can be extremely powerful to evaluate predictive breeding strategies in silico. One promising direction is to simulate novel genotypes using conventional methods, then apply generative models to produce realistic phenotypes conditional on genotype and environment.

Authors

  • M Pérez-Enciso
    ICREA - Centre for Research in Agricultural Genomics, Barcelona, Spain.
  • L M Zingaretti
    Centre for Research in Agricultural Genomics (CRAG), CSIC-IRTA-UAB-UB, Campus UAB, 08193, Bellaterra, Barcelona, Spain.
  • G de Los Campos
    Departments of Epidemiology and Biostatistics, Michigan State University, East Lansing, MI, 48824, USA.