ProtPlat: an efficient pre-training platform for protein classification based on FastText.
Journal:
BMC bioinformatics
Published Date:
Feb 11, 2022
Abstract
BACKGROUND: For the past decades, benefitting from the rapid growth of protein sequence data in public databases, a lot of machine learning methods have been developed to predict physicochemical properties or functions of proteins using amino acid sequence features. However, the prediction performance often suffers from the lack of labeled data. In recent years, pre-training methods have been widely studied to address the small-sample issue in computer vision and natural language processing fields, while specific pre-training techniques for protein sequences are few.