Refinement of genetic variants needs attention
Journal:
arXiv
Published Date:
Aug 1, 2024
Abstract
Variant calling refinement is crucial for distinguishing true genetic
variants from technical artifacts in high-throughput sequencing data. Manual
review is time-consuming while heuristic filtering often lacks optimal
solutions. Traditional variant calling methods often struggle with accuracy,
especially in regions of low read coverage, leading to false-positive or
false-negative calls. Here, we introduce VariantTransformer, a
Transformer-based deep learning model, designed to automate variant calling
refinement directly from VCF files in low-coverage data (10-15X).
VariantTransformer, trained on two million variants, including SNPs and short
InDels, from low-coverage sequencing data, achieved an accuracy of 89.26% and a
ROC AUC of 0.88. When integrated into conventional variant calling pipelines,
VariantTransformer outperformed traditional heuristic filters and approached
the performance of state-of-the-art AI-based variant callers like DeepVariant.
Comparative analysis demonstrated VariantTransformer's superiority in
functionality, variant type coverage, training size, and input data type.
VariantTransformer represents a significant advancement in variant calling
refinement for low-coverage genomic studies.