PrESOgenesis: A two-layer multi-label predictor for identifying fertility-related proteins using support vector machine and pseudo amino acid composition approach.

Journal: Scientific reports
PMID:

Abstract

Successful spermatogenesis and oogenesis are the two genetically independent processes preceding embryo development. To date, several fertility-related proteins have been described in mammalian species. Nevertheless, further studies are required to discover more proteins associated with the development of germ cells and embryogenesis in order to shed more light on the processes. This work builds on our previous software (OOgenesis_Pred), mainly focusing on algorithms beyond what was previously done, in particular new fertility-related proteins and their classes (embryogenesis, spermatogenesis and oogenesis) based on the support vector machine according to the concept of Chou's pseudo-amino acid composition features. The results of five-fold cross validation, as well as the independent test demonstrated that this method is capable of predicting the fertility-related proteins and their classes with accuracy of more than 80%. Moreover, by using feature selection methods, important properties of fertility-related proteins were identified that allowed for their accurate classification. Based on the proposed method, a two-layer classifier software, named as "PrESOgenesis" ( https://github.com/mrb20045/PrESOgenesis ) was developed. The tool identified a query sequence (protein or transcript) as fertility or non-fertility-related protein at the first layer and then classified the predicted fertility-related protein into different classes of embryogenesis, spermatogenesis or oogenesis at the second layer.

Authors

  • Mohammad Reza Bakhtiarizadeh
    Department of Animal and Poultry Science, College of Aburaihan, University of Tehran, Tehran, Iran. mrbakhtiari@ut.ac.ir.
  • Maryam Rahimi
    Department of Animal and Poultry Science, College of Aburaihan, University of Tehran, Tehran, Iran.
  • Abdollah Mohammadi-Sangcheshmeh
    Department of Animal and Poultry Science, College of Aburaihan, University of Tehran, Tehran, Iran.
  • Vahid Shariati J
    Genome Center, National Institute of Genetic Engineering and Biotechnology, Tehran, Iran.
  • Seyed Alireza Salami
    University of Tehran, Tehran, Iran. asalami@ut.ac.ir.