Prediction of Transcription Factor DNA Binding Affinity with High-Throughput Kd Measurements and Deep Learning
Journal:
bioRxiv
Published Date:
May 20, 2026
Abstract
Transcription factors (TFs) regulate gene expression through specific interactions with genomic DNA. While TF binding motifs from public databases describe sequence preferences, quantifying genome-wide affinity (Kd) is highly desirable for a more accurate thermodynamic description. Here, we report ivtFOODIE (in vitro FOOtprinting with DeamInasE), an assay that leverages deaminase-mediated cytosine-to-uracil conversion to measure Kd values for a given TF across accessible genomic regions from human cells. By pre-training on TF binding sites from JASPAR and fine-tuning with our ivtFOODIE data from 46 TFs representing 13 different DNA-binding domains (DBDs), we developed Seq2Kd, a deep learning model capable of predicting a TF's absolute binding affinity on DNA sequences. Seq2Kd enables de novo motif discovery of ~500 previously uncharacterized human TFs and reveals the effects of genetic variation both in TF-coding regions and DNA-binding sites on gene expression and disease susceptibility. By correlating predicted affinity changes with the sign and magnitude of expression quantitative trait locus (eQTL) effects, we stratified TFs into activator-like and repressor-like groups. Compared to clinically benign variants, pathogenic single-nucleotide variants (SNVs) within regulatory and protein-coding regions show significantly larger predicted shifts in Kd. We provide an interactive web portal, the ENcyclopedia of Transcription-factor Interactions with Regulatory Elements (ENTIRE), which integrates the Seq2Kd model with the ivtFOODIE dataset. This resource offers thermodynamic prediction for TF-DNA interactions for functional genomics and human disease.