Identifying the missing proteins in human proteome by biological language model.

Journal: BMC systems biology

Published Date: Dec 23, 2016

Abstract

BACKGROUND: With the rapid development of high-throughput sequencing technology, the proteomics research becomes a trendy field in the post genomics era. It is necessary to identify all the native-encoding protein sequences for further function and pathway analysis. Toward that end, the Human Proteome Organization lunched the Human Protein Project in 2011. However many proteins are hard to be detected by experiment methods, which becomes one of the bottleneck in Human Proteome Project. In consideration of the complicatedness of detecting these missing proteins by using wet-experiment approach, here we use bioinformatics method to pre-filter the missing proteins.

Authors

Qiwen Dong

Institute for Data Science and Engineering, East China Normal University, Shanghai 200062, People's Republic of China.
Kai Wang

Department of Rheumatology, The Affiliated Huai'an No. 1 People's Hospital of Nanjing Medical University, Huai'an, Jiangsu, China.
Xuan Liu

Department of Electrical and Computer Engineering, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, USA.

Keywords

Databases, Protein Gene Ontology Humans Intracellular Space Models, Theoretical Natural Language Processing Probability Protein Transport Proteome Proteomics

External Resources

View on PubMed Access via DOI PubMed (28155671)

Identifying the missing proteins in human proteome by biological language model.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals