MiPepid: MicroPeptide identification tool using machine learning.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: Micropeptides are small proteins with length < = 100 amino acids. Short open reading frames that could produces micropeptides were traditionally ignored due to technical difficulties, as few small peptides had been experimentally confirmed. In the past decade, a growing number of micropeptides have been shown to play significant roles in vital biological activities. Despite the increased amount of data, we still lack bioinformatics tools for specifically identifying micropeptides from DNA sequences. Indeed, most existing tools for classifying coding and noncoding ORFs were built on datasets in which "normal-sized" proteins were considered to be positives and short ORFs were generally considered to be noncoding. Since the functional and biophysical constraints on small peptides are likely to be different from those on "normal" proteins, methods for predicting short translated ORFs must be trained independently from those for longer proteins.

Authors

  • Mengmeng Zhu
    Electric Power Research Institute, Yunnan Power Grid Co., Ltd., Kunming, Yunnan, China.
  • Michael Gribskov
    Department of Biological Sciences, Purdue University, West Lafayette, IN, USA.