Forensic STR allele extraction using a machine learning paradigm.

Journal: Forensic science international. Genetics
PMID:

Abstract

We present a machine learning approach to short tandem repeat (STR) sequence detection and extraction from massively parallel sequencing data called Fragsifier. Using this approach, STRs are detected on each read by first locating the longest repeat stretches followed by locus prediction using k-mers in a machine learning sequence model. This is followed by reference flanking sequence alignment to determine precise STR boundaries. We show that Fragsifier produces genotypes that are concordant with profiles obtained using capillary electrophoresis (CE), and also compared the results with that of STRait Razor and the ForenSeq UAS. The data pre-processing and training of the sequence classifier is readily scripted, allowing the analyst to experiment with different thresholds, datasets and loci of interest, and different machine learning models.

Authors

  • Yao-Yuan Liu
    Forensic Science Program, School of Chemical Sciences, University of Auckland, 38 Princes Street, Auckland 1010, New Zealand.
  • David Welch
    School of Computer Science, University of Auckland, 38 Princes Street, Auckland 1010, New Zealand.
  • Ryan England
    Forensic Science Program, School of Chemical Sciences, University of Auckland, 38 Princes Street, Auckland 1010, New Zealand; Institute of Environmental Science and Research Limited, Private Bag 92021, Auckland 1142, New Zealand.
  • Janet Stacey
    Institute of Environmental Science and Research Limited, Private Bag 92021, Auckland 1142, New Zealand.
  • SallyAnn Harbison
    Institute of Environmental Science and Research Limited, Private Bag 92021, Auckland 1142, New Zealand. Electronic address: sallyann.harbison@esr.cri.nz.