Unimeth: A unified transformer framework for accurate DNA methylation detection from nanopore reads
Journal:
bioRxiv
Published Date:
Jan 1, 2025
Abstract
Nanopore sequencing has emerged as a powerful technology for DNA methylation detection, particularly in repetitive genomic regions and at the haplotype scale. However, existing computational methods show inconsistent accuracy across sequence contexts, species, and sequencing chemistries. Here, we present Unimeth, a unified transformer-based framework that simultaneously predicts multi-site methylation from nanopore reads. Unimeth employs a patch-based architecture and a three-phase training strategy, including pre-training, read-level fine-tuning, and site-level calibration, to fully leverage genome-wide methylation information. In comprehensive benchmarks involving 20 samples spanning 13 species, Unimeth consistently outperforms state-of-the-art methods. This unified approach demonstrates superior accuracy and significantly reduced false positives across a wide range of scenarios, including the detection of both 5mC and 6mA, application in organisms from mammals and plants to bacteria, analysis of both wild-type and mutant samples, and use of both R10.4 and R9.4 pore chemistries. Furthermore, Unimeth is demonstrated to be a highly accurate tool for methylation analysis in transposons and centromeric regions.