Machine learning approach for pooled DNA sample calibration.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: Despite ongoing reduction in genotyping costs, genomic studies involving large numbers of species with low economic value (such as Black Tiger prawns) remain cost prohibitive. In this scenario DNA pooling is an attractive option to reduce genotyping costs. However, genotyping of pooled samples comprising DNA from many individuals is challenging due to the presence of errors that exceed the allele frequency quantisation size and therefore cannot be simply corrected by clustering techniques. The solution to the calibration problem is a correction to the allele frequency to mitigate errors incurred in the measurement process. We highlight the limitations of the existing calibration solutions such as the fact they impose assumptions on the variation between allele frequencies 0, 0.5, and 1.0, and address a limited set of error types. We propose a novel machine learning method to address the limitations identified.

Authors

  • Andrew D Hellicar
    CSIRO Computational Informatics, Castray Esplanade, Hobart, Australia. andrew.hellicar@csiro.au.
  • Ashfaqur Rahman
    CSIRO Computational Informatics, Castray Esplanade, Hobart, Australia. ashfaqur.rahman@csiro.au.
  • Daniel V Smith
    CSIRO Computational Informatics, Castray Esplanade, Hobart, Australia. daniel.v.smith@csiro.au.
  • John M Henshall
    CSIRO Agriculture Flagship, Armidale, Australia. john.henshall@csiro.au.