Prediction of zinc-binding sites using multiple sequence profiles and machine learning methods.

Journal: Molecular omics

Published Date: Jun 1, 2019

Abstract

The zinc (Zn) cofactor has been proven to be involved in numerous biological mechanisms and the zinc-binding site is recognized as one of the most important post-translation modifications in proteins. Therefore, accurate knowledge of zinc ions in protein structures can provide potential clues for elucidation of protein folding and functions. However, determining zinc-binding residues by experimental means is usually lab-intensive and associated with high cost in most cases. In this context, the development of computational tools for identifying zinc-binding sites is highly desired, especially in the current post-genomic era. In this work, we developed a novel zinc-binding site prediction method by combining several intensively-trained machine learning models. To establish an accurate and generative method, we downloaded all zinc-binding proteins from the Protein Data Bank and prepared a non-redundant dataset. Meanwhile, a well-prepared dataset by other groups was also used. Then, effective and complementary features were extracted from sequences and three-dimensional structures of these proteins. Moreover, several well-designed machine learning models were intensively trained to construct accurate models. To assess the performance, the obtained predictors were stringently benchmarked using the diverse zinc-binding sites. Furthermore, several state-of-the-art in silico methods developed specifically for zinc-binding sites were also evaluated and compared. The results confirmed that our method is very competitive in real world applications and could become a complementary tool to wet lab experiments. To facilitate research in the community, a web server and stand-alone program implementing our method were constructed and are publicly available at . The downloadable program of our method can be easily used for the high-throughput screening of potential zinc-binding sites across proteomes.

Authors

Renxiang Yan

School of Biological Sciences and Engineering, Fuzhou University, Fuzhou 350002, China. yanrenxiang@fzu.edu.cn ljuan@fzu.edu.cn and Fujian Key Laboratory of Marine Enzyme Engineering, Fuzhou 350002, China.
XiaoFeng Wang

Indiana University Bloomington.
Yarong Tian

Institute of Biomedicine, Sahlgrenska Academy, University of Gothenburg, 40530, Sweden.
Jing Xu

First Department of Infectious Diseases, The First Affiliated Hospital of China Medical University, Shenyang, China.
Xiaoli Xu

School of Biological Sciences and Engineering, Fuzhou University, Fuzhou 350002, China. yanrenxiang@fzu.edu.cn ljuan@fzu.edu.cn.
Juan Lin

Fujian Key Laboratory of Marine Enzyme Engineering, Fuzhou University Fuzhou, China.

Keywords

Algorithms Amino Acid Sequence Binding Sites Computational Biology Computer Simulation Databases, Protein Machine Learning Protein Binding Protein Conformation Protein Folding Software Support Vector Machine Zinc

External Resources

View on PubMed Access via DOI PubMed (31046040)

Prediction of zinc-binding sites using multiple sequence profiles and machine learning methods.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals