Protein classification using modified n-grams and skip-grams.

Journal: Bioinformatics (Oxford, England)

Published Date: May 1, 2018

Abstract

MOTIVATION: Classification by supervised machine learning greatly facilitates the annotation of protein characteristics from their primary sequence. However, the feature generation step in this process requires detailed knowledge of attributes used to classify the proteins. Lack of this knowledge risks the selection of irrelevant features, resulting in a faulty model. In this study, we introduce a supervised protein classification method with a novel means of automating the work-intensive feature generation step via a Natural Language Processing (NLP)-dependent model, using a modified combination of n-grams and skip-grams (m-NGSG).

Authors

S M Ashiqul Islam

Institute of Biomedical Studies, Baylor University, Waco, TX, USA. S_Islam@Baylor.edu.
Benjamin J Heil

Department of Computer Science.
Christopher Michel Kearney

Institute of Biomedical Studies, Baylor University, Waco, TX, USA. Chris_Kearney@Baylor.edu.
Erich J Baker

Institute of Biomedical Studies, Baylor University, Waco, TX, USA. Erich_Baker@Baylor.edu.

Keywords

Models, Molecular Molecular Sequence Annotation Natural Language Processing Protein Conformation Proteins Sequence Analysis, Protein Supervised Machine Learning

External Resources

View on PubMed Access via DOI PubMed (29309523)

Protein classification using modified n-grams and skip-grams.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals