SNooPy: a statistical framework for long-read metagenomic variant calling
Journal:
bioRxiv
Published Date:
Jan 1, 2025
Abstract
Current long-read single-nucleotide variant callers were designed primarily for genomic data—particularly human genomes. While some have been used on metagenomic data, their underlying assumptions and training procedures fail to account for the inherent complexity of metagenomic samples. To date, no long-read variant caller has been purpose-built for metagenomic applications. To address this gap, we present SNooPy, a SNP-calling tool that implements a new statistical framework tailored to long-read metagenomic data. Unlike previous genomic methods, our approach makes no assumptions about the number of haplotypes present, their evolutionary relationships, or their sequence divergence. We demonstrate that SNooPy outperforms both traditional statistical and deep learning–based SNP callers. Our results suggest that future integration of this framework with deep learning approaches could further enhance variant calling performance.