SNV and indel error modeling of deep targeted cell-free DNA sequencing data for sensitive detection of circulating tumor DNA in colorectal cancer

Journal: bioRxiv
Published Date:

Abstract

Circulating tumor DNA (ctDNA) is a promising biomarker for cancer detection, but low tumor burden makes it difficult to distinguish true signal from background noise. To aggregate and better evaluate weak mutational signals, we propose PyDREAMS, which incorporates both single-nucleotide variants (SNVs) and insertions and deletions (indels) for ctDNA detection and quantification. To distinguish signal from noise, a neural network background error model is learned from healthy controls. It captures the joint effects of cell-free DNA (cfDNA)-specific lesions and sequencing errors, accounting for both genomic context and read-level features. Finally, a statistical test is used to evaluate the presence of mutational signals. We evaluate the method in a tumor-informed setting, using cohorts of colorectal cancer samples with deep targeted plasma cfDNA sequencing across 12 cancer driver genes. We trained PyDREAMS on 46 healthy controls, with feature analysis revealing that both SNV and indel error rates were lowest at mononucleosomal fragment lengths, suggesting that nucleosomes protect cfDNA and reduce lesion accumulation during circulation and sample handling. In the validation cohort, combining SNVs with indels improved detection, with indels contributing approximately 1.5-fold more evidence per mutation than SNVs. On a test cohort of 209 stage I to III colorectal cancer (CRC) patients and 24 healthy controls, PyDREAMS outperformed a Shearwater-based caller, with an area under the receiver operating characteristic curve (AUC) of 0.917 compared with 0.909. In stage III post-operative (Post-OP) samples (n = 26), where ctDNA was expected only in non-cured patients, PyDREAMS detected ctDNA in 5 patients, including 3 of 9 with later recurrence, while Shearwater detected none. Together, these results show that PyDREAMS improves evaluation of ultra-low-frequency tumor signals through unified read-level modelling of SNV and indel background error.

Authors

  • Diekema
  • M. H.; Rasmussen
  • M. H.; Drue
  • S. O.; Frydendahl
  • A.; Andersen
  • C. L.; Pedersen
  • J.

Categories