Folding non-homologous proteins by coupling deep-learning contact maps with I-TASSER assembly simulations.

Journal: Cell reports methods
Published Date:

Abstract

Structure prediction for proteins lacking homologous templates in the Protein Data Bank (PDB) remains a significant unsolved problem. We developed a protocol, C-I-TASSER, to integrate interresidue contact maps from deep neural-network learning with the cutting-edge I-TASSER fragment assembly simulations. Large-scale benchmark tests showed that C-I-TASSER can fold more than twice the number of non-homologous proteins than the I-TASSER, which does not use contacts. When applied to a folding experiment on 8,266 unsolved Pfam families, C-I-TASSER successfully folded 4,162 domain families, including 504 folds that are not found in the PDB. Furthermore, it created correct folds for 85% of proteins in the SARS-CoV-2 genome, despite the quick mutation rate of the virus and sparse sequence profiles. The results demonstrated the critical importance of coupling whole-genome and metagenome-based evolutionary information with optimal structure assembly simulations for solving the problem of non-homologous protein structure prediction.

Authors

  • Wei Zheng
    School of Computer Engineering, Jinling Institute of Technology, Nanjing, 211169, China. zhengwei@jit.edu.cn.
  • Chengxin Zhang
    Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan.
  • Yang Li
    Occupation of Chinese Center for Disease Control and Prevention, Beijing, China.
  • Robin Pearce
    Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI 48109.
  • Eric W Bell
    Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan, United States of America.
  • Yang Zhang
    Innovative Institute of Chinese Medicine and Pharmacy, Academy for Interdiscipline, Chengdu University of Traditional Chinese Medicine, Chengdu, China.