Breaking down data silos across companies to train genome-wide predictions: A feasibility study in wheat.

Journal: Plant biotechnology journal
Published Date:

Abstract

Big data, combined with artificial intelligence (AI) techniques, holds the potential to significantly enhance the accuracy of genome-wide predictions. Motivated by the success reported for wheat hybrids, we extended the scope to inbred lines by integrating phenotypic and genotypic data from four commercial wheat breeding programs. Acting as an academic data trustee, we merged these data with historical experimental series from previous public-private partnerships. The integrated data spanned 12 years, 168 environments, and provided a genomic prediction training set of up to ~9500 genotypes for grain yield, plant height and heading date. Despite the heterogeneous phenotypic and genotypic data, we were able to obtain high-quality data by implementing rigorous data curation, including SNP imputation. We utilized the data to compare genomic best linear unbiased predictions with convolutional neural network-based genomic prediction. Our analysis revealed that we could flexibly combine experimental series for genomic prediction, with prediction ability steadily improving as the training set sizes increased, peaking at around 4000 genotypes. As training set sizes were further increased, the gains in prediction ability decreased, approaching a plateau well below the theoretical limit defined by the square root of the heritability. Potential avenues, such as designed training sets or novel non-linear prediction approaches, could overcome this plateau and help to more fully exploit the high-value big data generated by breaking down data silos across companies.

Authors

  • Moritz Lell
    Leibniz Institute for Plant Genetics and Crop Plant Research, Seeland, Germany.
  • Abhishek Gogna
    Leibniz Institute for Plant Genetics and Crop Plant Research, Seeland, Germany.
  • Vincent Kloesgen
    Leibniz Institute for Plant Genetics and Crop Plant Research, Seeland, Germany.
  • Ulrike Avenhaus
    W. von Borries-Eckendorf GmbH & Co. KG, Leopoldshöhe, Germany.
  • Jost Dörnte
    Deutsche Saatveredelung AG, Lippstadt, Germany.
  • Wera Maria Eckhoff
    KWS SAAT SE & Co. KGaA, Einbeck, Germany.
  • Tobias Eschholz
    Nordsaat Saatzucht GmbH, Langenstein, Germany.
  • Mario Gils
    Nordsaat Saatzucht GmbH, Langenstein, Germany.
  • Martin Kirchhoff
    Nordsaat Saatzucht GmbH, Langenstein, Germany.
  • Michael Koch
    Veterinary Specialists and Emergency Services, 825 White Spruce Blvd, Rochester, NY 14623, United States. Electronic address: kochm@att.net.
  • Sonja Kollers
    KWS SAAT SE & Co. KGaA, Einbeck, Germany.
  • Nina Pfeiffer
    KWS LOCHOW GmbH, Northeim, Germany.
  • Matthias Rapp
    W. von Borries-Eckendorf GmbH & Co. KG, Leopoldshöhe, Germany.
  • Valentin Wimmer
    KWS SAAT SE, Grimsehlstr. 31, 37574, Einbeck, Germany.
  • Markus Wolf
    SU BIOTEC GmbH, Gatersleben, Germany.
  • Jochen Reif
    Leibniz Institute for Plant Genetics and Crop Plant Research, Seeland, Germany.
  • Yusheng Zhao
    Leibniz Institute for Plant Genetics and Crop Plant Research, Seeland, Germany.