Prediction of zinc-binding sites using multiple sequence profiles and machine learning methods.

Journal: Molecular omics
Published Date:

Abstract

The zinc (Zn) cofactor has been proven to be involved in numerous biological mechanisms and the zinc-binding site is recognized as one of the most important post-translation modifications in proteins. Therefore, accurate knowledge of zinc ions in protein structures can provide potential clues for elucidation of protein folding and functions. However, determining zinc-binding residues by experimental means is usually lab-intensive and associated with high cost in most cases. In this context, the development of computational tools for identifying zinc-binding sites is highly desired, especially in the current post-genomic era. In this work, we developed a novel zinc-binding site prediction method by combining several intensively-trained machine learning models. To establish an accurate and generative method, we downloaded all zinc-binding proteins from the Protein Data Bank and prepared a non-redundant dataset. Meanwhile, a well-prepared dataset by other groups was also used. Then, effective and complementary features were extracted from sequences and three-dimensional structures of these proteins. Moreover, several well-designed machine learning models were intensively trained to construct accurate models. To assess the performance, the obtained predictors were stringently benchmarked using the diverse zinc-binding sites. Furthermore, several state-of-the-art in silico methods developed specifically for zinc-binding sites were also evaluated and compared. The results confirmed that our method is very competitive in real world applications and could become a complementary tool to wet lab experiments. To facilitate research in the community, a web server and stand-alone program implementing our method were constructed and are publicly available at . The downloadable program of our method can be easily used for the high-throughput screening of potential zinc-binding sites across proteomes.

Authors

  • Renxiang Yan
    School of Biological Sciences and Engineering, Fuzhou University, Fuzhou 350002, China. yanrenxiang@fzu.edu.cn ljuan@fzu.edu.cn and Fujian Key Laboratory of Marine Enzyme Engineering, Fuzhou 350002, China.
  • XiaoFeng Wang
    Indiana University Bloomington.
  • Yarong Tian
    Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, 40530, Sweden.
  • Jing Xu
    First Department of Infectious Diseases, The First Affiliated Hospital of China Medical University, Shenyang, China.
  • Xiaoli Xu
    School of Biological Sciences and Engineering, Fuzhou University, Fuzhou 350002, China. yanrenxiang@fzu.edu.cn ljuan@fzu.edu.cn.
  • Juan Lin
    Fujian Key Laboratory of Marine Enzyme Engineering, Fuzhou University Fuzhou, China.