Deep learning deciphers the related role of master regulators and G-quadruplexes in tissue specification.

Journal: Scientific reports
Published Date:

Abstract

G-quadruplexes (GQs) are non-canonical DNA structures encoded by G-flipons with potential roles in gene regulation and chromatin structure. Here, we explore the role of G-flipons in tissue specification. We present a deep learning-based framework for the genome-wide G-flipon predictions across 14 human tissue types. The model was trained using high-confidence experimental maps of GQ-forming sequences and ATAC-seq peaks, conjoined with the location of RNA polymerase, histone marks, and transcription factor binding sites. The training dataset for the DeepGQ model was derived from EndoQuad level 4-6 GQs. Model predictions were subsequently validated against the comprehensive EndoQuad dataset (levels 1-6) to optimize the whole-genome prediction threshold. To identify tissue-specific regulatory patterns, we classified GQ promoter predictions as either 'core' or 'tissue-specific'. We identified a notable overlap between predicted unique tissue-specific GQ sites and master regulatory genes (MRGs), tissue-specific DNase-hypersensitivity sites, and proteins that modulate R-loop formation. Collectively, the findings highlight the transactions between MRG and G-flipons intermediated by RNA: DNA hybrids associated with tissue specification.

Authors

  • Artem Bashkatov
    International Laboratory of Bioinformatics, HSE University, Moscow, Russia.
  • Andrey Andreasyan
    International Laboratory of Bioinformatics, HSE University, Moscow, Russia.
  • Dmitry Konovalov
    International Laboratory of Bioinformatics, HSE University, Moscow, Russia.
  • Alan Herbert
    International Laboratory of Bioinformatics, HSE University, Moscow, Russia. alan.herbert@insideoutbio.com.
  • Maria Poptsova
    International Laboratory of Bioinformatics, AI and Digital Sciences Institute, Faculty of Computer Science, HSE University, Moscow, Russia.