Epigenetic conditioning improves sequence-based modeling of gene regulation across cell types and alleles

Journal: bioRxiv
Published Date:

Abstract

Epigenetic state modulates gene regulation in a manner not always predictable from DNA sequence alone, yet current genomic deep learning models do not leverage epigenetic state as input. We present MethylSeqNet, a model that conditions pretrained sequence embeddings on CpG methylation, a stable epigenetic mark increasingly available from long-read sequencing data. Using a novel conditioning mechanism enabling scalability and interpretability, MethylSeqNet improves predictions in cases where differential epigenetic state drives regulatory variation. We show improvements over a sequence-only baseline for cell-type-specific chromatin accessibility and transcription. Epigenetic conditioning enables prediction of phenomena not encoded in allele sequence, including parent-of-origin imprinting, random monoallelic activity, and X-inactivation. We highlight a promising application of methylation conditioning by predicting the effects of a structural rearrangement in one rare disease patient case study. In silico motif insertion analysis confirms that MethylSeqNet learns methylation-dependent regulatory grammar, establishing a paradigm for integrating epigenetic information into genomic deep learning with immediate applications in rare disease interpretation.

Authors

  • Dixon-Luinenburg
  • O.; Bajwa
  • A.; Vollger
  • M. R.; Stergachis
  • A.; Streets
  • A. M.; Ioannidis
  • N. M.

Categories