DoBSeqWF: A framework for sensitive detection of individual genetic variation in pooled sequencing data

Journal: medRxiv
Published Date:

Abstract

Population screening for rare genetic diseases is limited by the high cost of next- generation sequencing. Double-batched sequencing (DoBSeq) is a cost-effective method for assigning rare variants to individuals using two-dimensional unique double- pooled sequencing. However, this method produces complex, high-depth sequencing data that requires a specialized workflow for efficient and reproducible analysis. We developed DoBSeqWF (DoBSeq Workflow), a Nextflow-based pipeline for processing the pooled sequencing data from alignment through variant calling, filtering, and ultimately individual assignment of rare variants. Using separate training and validation datasets with whole genome sequencing as the gold standard, we benchmarked multiple variant callers, and we developed and implemented machine learning filters that improve rare variant calling performance while maintaining high sensitivity. The pipeline enables reproducible analysis and can be easily updated as bioinformatic tools and variant interpretations evolve. DoBSeqWF is freely available at https://github.com/RasmussenLab/DoBSeqWF. Contact: [email protected]

Authors

  • Mads Cort Nielsen; Christian Munch Hagen; Ulrik Kristoffer Stoltze; Thomas van Overeem Hansen; Mette Nyegaard; Henrik Hjalgrim; Marie Bækvad-Hansen; Anna Byrjalsen; Kjeld Schmiegelow; Karin Wadt; Jonas Bybjerg-Grauholm; Simon Rasmussen