Accurate somatic variant detection from formalin fixed, paraffin embedded tissue (FFPE) derived WES and WGS by DeepOmicsFFPE-PLUS, a sequence context-based transformer

Journal: bioRxiv
Published Date:

Abstract

Formalin-fixed paraffin-embedded (FFPE) tissues represent a vast archival resource for genomic studies, yet their utility remains constrained by fixation-induced DNA damage and subsequent sequencing artifacts. To comprehensively characterize and address this challenge, we analyzed matched FFPE and fresh-frozen tumor samples from two institutions, spanning different storage durations, DNA qualities, sequencing platforms (WES and WGS), exome capture kits, and somatic variant callers. We found that FFPE-induced artifacts exhibit strong batch- and age-specific patterns, with a predominance of C:G>T:A substitutions, which particularly complicate the accurate identification of low allele frequency true variants. Enzymatic repair methods partially alleviated these artifacts but remained insufficient. To overcome these limitations, we developed DeepOmicsFFPE-PLUS(https://github.com/Theragen-Bio/DeepOmicsFFPE-PLUS), an advanced AI-based tool to accurately distinguish true somatic variants from FFPE-specific artifacts. DeepOmicsFFPE-PLUS demonstrated consistently superior performance across diverse conditions, achieving high sensitivity and specificity-even for low-frequency variants-outperforming existing tools. Application of our model to WGS data further enabled recovery of biologically relevant mutational signatures, including restoration of microsatellite instability (MSI)-associated signatures initially obscured by FFPE artifacts. Our findings underscore the necessity of artifact-aware variant calling in FFPE genomics and establish DeepOmicsFFPE-PLUS as a robust tool for artifact removal, enabling high-fidelity downstream analyses and personalized therapeutic target discovery.

Authors

  • Kim
  • W. S.; Lee
  • A.; Kim
  • I.; Hong
  • S.-E.; Park
  • J.; Lim
  • J.; Noh
  • E.; Kim
  • M.; Park
  • J.; Han
  • J.; Park
  • M.; Cho
  • H.; Shin
  • J.; Yang
  • Y.; Paik
  • S.; Heo
  • D.-h.

Categories