ProtPlat: an efficient pre-training platform for protein classification based on FastText.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: For the past decades, benefitting from the rapid growth of protein sequence data in public databases, a lot of machine learning methods have been developed to predict physicochemical properties or functions of proteins using amino acid sequence features. However, the prediction performance often suffers from the lack of labeled data. In recent years, pre-training methods have been widely studied to address the small-sample issue in computer vision and natural language processing fields, while specific pre-training techniques for protein sequences are few.

Authors

  • Yuan Jin
  • Yang Yang
    Department of Gastrointestinal Surgery, The Third Hospital of Hebei Medical University, Shijiazhuang, China.