Predictive biophysical neural network modeling of a compendium of in vivo transcription factor DNA binding profiles for Escherichia coli.

Journal: Nature communications
PMID:

Abstract

The DNA binding of most Escherichia coli Transcription Factors (TFs) has not been comprehensively mapped, and few have models that can quantitatively predict binding affinity. We report the global mapping of in vivo DNA binding for 139 E. coli TFs using ChIP-Seq. We use these data to train BoltzNet, a novel neural network that predicts TF binding energy from DNA sequence. BoltzNet mirrors a quantitative biophysical model and provides directly interpretable predictions genome-wide at nucleotide resolution. We use BoltzNet to quantitatively design novel binding sites, which we validate with biophysical experiments on purified protein. We generate models for 124 TFs that provide insight into global features of TF binding, including clustering of sites, the role of accessory bases, the relevance of weak sites, and the background affinity of the genome. Our paper provides new paradigms for studying TF-DNA binding and for the development of biophysically motivated neural networks.

Authors

  • Patrick Lally
    Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA, USA.
  • Laura Gómez-Romero
    Instituto Nacional de Medicina Genómica, Periférico Sur 4809, Arenal Tepepan, Ciudad de México, México, México.
  • Víctor H Tierrafría
    Computational Genomics, Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México (UNAM). A.P., 565-A Cuernavaca, Morelos, 62100, México.
  • Patricia Aquino
    Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA, USA.
  • Claire Rioualen
    Centro de Ciencias Genómicas, Universidad Nacional Autónoma de México, Avenida Universidad s/n, Cuernavaca, Morelos, México.
  • Xiaoman Zhang
    Department of Neurology, The First Hospital of Hebei Medical University, Shijiazhuang, Hebei, China.
  • Sunyoung Kim
    Department of Family Medicine, Kyung Hee University Hospital, Seoul, Republic of Korea.
  • Gabriele Baniulyte
    Wadsworth Center, New York State Department of Health, Albany, NY, USA.
  • Jonathan Plitnick
    Wadsworth Center, New York State Department of Health, Albany, NY, USA.
  • Carol Smith
    Wadsworth Center, New York State Department of Health, Albany, NY, USA.
  • Mohan Babu
    Department of Biochemistry, University of Regina, Regina, Saskatchewan, SK, Canada.
  • Julio Collado-Vides
    Computational Genomics Program, Center for Genomic Sciences, National Autonomous University of Mexico, Av. Universidad, s/n, Colonia Chamilpa, Cuernavaca, Morelos 62100, Mexico.
  • Joseph T Wade
    Wadsworth Center, New York State Department of Health, Albany, NY, USA.
  • James E Galagan
    Department of Biomedical Engineering, Boston University, 44 Cummington Mall, Boston, MA, USA. jgalag@bu.edu.