A Modular Open Source Framework for Genomic Variant Calling
Journal:
arXiv
Published Date:
Nov 18, 2024
Abstract
Variant calling is a fundamental task in genomic research, essential for
detecting genetic variations such as single nucleotide polymorphisms (SNPs) and
insertions or deletions (indels). This paper presents an enhancement to
DeepChem, a widely used open-source drug discovery framework, through the
integration of DeepVariant. In particular, we introduce a variant calling
pipeline that leverages DeepVariant's convolutional neural network (CNN)
architecture to improve the accuracy and reliability of variant detection. The
implemented pipeline includes stages for realignment of sequencing reads,
candidate variant detection, and pileup image generation, followed by variant
classification using a modified Inception v3 model. Our work adds a modular and
extensible variant calling framework to the DeepChem framework and enables
future work integrating DeepChem's drug discovery infrastructure more tightly
with bioinformatics pipelines.