Specific and intrinsic sequence patterns extracted by deep learning from intra-protein binding and non-binding peptide fragments.

Journal: Scientific reports
Published Date:

Abstract

The key finding in the DNA double helix model is the specific pairing or binding between nucleotides A-T and C-G, and the pairing rules are the molecule basis of genetic code. Unfortunately, no such rules have been discovered for proteins. Here we show that intrinsic sequence patterns between intra-protein binding peptide fragments exist, they can be extracted using a deep learning algorithm, and they bear an interesting semblance to the DNA double helix model. The intra-protein binding peptide fragments have specific and intrinsic sequence patterns, distinct from non-binding peptide fragments, and multi-millions of binding and non-binding peptide fragments from currently available protein X-ray structures are classified with an accuracy of up to 93%. The specific binding between short peptide fragments may provide an important driving force for protein folding and protein-protein interaction, two open and fundamental problems in molecular biology, and it may have significant potential in design, discovery, and development of peptide, protein, and antibody drugs.

Authors

  • Yuhong Wang
    Wuhan Institute for Food and Cosmetic Control, Wuhan 430014, China.
  • Junzhou Huang
  • Wei Li
    Department of Nephrology, The Second Affiliated Hospital of Guangxi Medical University, Nanning, Guangxi, China.
  • Sheng Wang
    Intensive Care Medical Center, Tongji Hospital, School of Medicine, Tongji University, Shanghai, 200065, People's Republic of China.
  • Chuanfan Ding
    Department of chemistry and Laser Chemistry Institute, Fudan University, Shanghai, 200433, P.R. China.