Identifying barley pan-genome sequence anchors using genetic mapping and machine learning.

Journal: TAG. Theoretical and applied genetics. Theoretische und angewandte Genetik
PMID:

Abstract

We identified 1.844 million barley pan-genome sequence anchors from 12,306 genotypes using genetic mapping and machine learning. There is increasing evidence that genes from a given crop genotype are far to cover all genes in that species; thus, building more comprehensive pan-genomes is of great importance in genetic research and breeding. Obtaining a thousand-genotype scale pan-genome using deep-sequencing data is currently impractical for species like barley which has a huge and highly repetitive genome. To this end, we attempted to identify barley pan-genome sequence anchors from a large quantity of genotype-by-sequencing (GBS) datasets by combining genetic mapping and machine learning algorithms. Based on the GBS sequences from 11,166 domesticated and 1140 wild barley genotypes, we identified 1.844 million pan-genome sequence anchors. Of them, 532,253 were identified as presence/absence variation (PAV) tags. Through aligning these PAV tags to the genome of hulless barley genotype Zangqing320, our analysis resulted in a validation of 83.6% of them from the domesticated genotypes and 88.6% from the wild barley genotypes. Association analyses against flowering time, plant height and kernel size showed that the relative importance of the PAV and non-PAV tags varied for different traits. The pan-genome sequence anchors based on GBS tags can facilitate the construction of a comprehensive pan-genome and greatly assist various genetic studies including identification of structural variation, genetic mapping and breeding in barley.

Authors

  • Shang Gao
    Department of Orthopedics, Orthopedic Center of Chinese PLA, Southwest Hospital, Third Military Medical University, Chongqing, 400038, P.R.China.
  • Jinran Wu
    School of Mathematics and Physics, The University of Queensland, Brisbane, QLD, Australia.
  • Jiri Stiller
    Agriculture and Food, CSIRO, St Lucia, QLD, 4067, Australia.
  • Zhi Zheng
    Department of Chemical Engineering, School of Chemistry and Chemical Engineering, Nanjing University.
  • Meixue Zhou
    Tasmanian Institute of Agriculture, University of Tasmania, Prospect, TAS, 7250, Australia.
  • You-Gan Wang
    School of Mathematics and Physics, The University of Queensland, Brisbane, QLD, Australia.
  • Chunji Liu
    Agriculture and Food, CSIRO, St Lucia, QLD, 4067, Australia. Chunji.liu@csiro.au.