CoverM: Read alignment statistics for metagenomics
Journal:
arXiv
Published Date:
Jan 20, 2025
Abstract
Genome-centric analysis of metagenomic samples is a powerful method for
understanding the function of microbial communities. Calculating read coverage
is a central part of analysis, enabling differential coverage binning for
recovery of genomes and estimation of microbial community composition. Coverage
is determined by processing read alignments to reference sequences of either
contigs or genomes. Per-reference coverage is typically calculated in an ad-hoc
manner, with each software package providing its own implementation and
specific definition of coverage. Here we present a unified software package
CoverM which calculates several coverage statistics for contigs and genomes in
an ergonomic and flexible manner. It uses 'Mosdepth arrays' for computational
efficiency and avoids unnecessary I/O overhead by calculating coverage
statistics from streamed read alignment results. CoverM is free software
available at https://github.com/wwood/coverm. CoverM is implemented in Rust,
with Python (https://github.com/apcamargo/pycoverm) and Julia
(https://github.com/JuliaBinaryWrappers/CoverM_jll.jl) interfaces.