Prediction of Transcription Factor DNA Binding Affinity with High-Throughput Kd Measurements and Deep Learning

Journal: bioRxiv
Published Date:

Abstract

Transcription factors (TFs) regulate gene expression through specific interactions with genomic DNA. While TF binding motifs from public databases describe sequence preferences, quantifying genome-wide affinity (Kd) is highly desirable for a more accurate thermodynamic description. Here, we report ivtFOODIE (in vitro FOOtprinting with DeamInasE), an assay that leverages deaminase-mediated cytosine-to-uracil conversion to measure Kd values for a given TF across accessible genomic regions from human cells. By pre-training on TF binding sites from JASPAR and fine-tuning with our ivtFOODIE data from 46 TFs representing 13 different DNA-binding domains (DBDs), we developed Seq2Kd, a deep learning model capable of predicting a TF's absolute binding affinity on DNA sequences. Seq2Kd enables de novo motif discovery of ~500 previously uncharacterized human TFs and reveals the effects of genetic variation both in TF-coding regions and DNA-binding sites on gene expression and disease susceptibility. By correlating predicted affinity changes with the sign and magnitude of expression quantitative trait locus (eQTL) effects, we stratified TFs into activator-like and repressor-like groups. Compared to clinically benign variants, pathogenic single-nucleotide variants (SNVs) within regulatory and protein-coding regions show significantly larger predicted shifts in Kd. We provide an interactive web portal, the ENcyclopedia of Transcription-factor Interactions with Regulatory Elements (ENTIRE), which integrates the Seq2Kd model with the ivtFOODIE dataset. This resource offers thermodynamic prediction for TF-DNA interactions for functional genomics and human disease.

Authors

  • Wang
  • Z.; Wang
  • D.; Shen
  • K.; Luo
  • J.; Wang
  • X.; Wu
  • N.; Lang
  • Y.; Wang
  • X.; Ren
  • J.; Dong
  • W.; Pan
  • L.; Li
  • G.; Li
  • D.; Xie
  • C.; Zhang
  • Z.; Lyu
  • Y.; Yu
  • S.; Shan
  • L.; Zhang
  • N.; Yan
  • J.; Chen
  • M.; Xie
  • X. S.

Categories