Prediction of matrilineal specific patatin-like protein governing in-vivo maternal haploid induction in maize using support vector machine and di-peptide composition.

Journal: Amino acids
Published Date:

Abstract

The mutant matrilineal (mtl) gene encoding patatin-like phospholipase activity is involved in in-vivo maternal haploid induction in maize. Doubling of chromosomes in haploids by colchicine treatment leads to complete fixation of inbreds in just one generation compared to 6-7 generations of selfing. Thus, knowledge of patatin-like proteins in other crops assumes great significance for in-vivo haploid induction. So far, no online tool is available that can classify unknown proteins into patatin-like proteins. Here, we aimed to optimize a machine learning-based algorithm to predict the patatin-like phospholipase activity of unknown proteins. Four different kernels [radial basis function (RBF), sigmoid, polynomial, and linear] were used for building support vector machine (SVM) classifiers using six different sequence-based compositional features (AAC, DPC, GDPC, CTDC, CTDT, and GAAC). A total of 1170 protein sequences including both patatin-like (585 sequences) from various monocots, dicots, and microbes; and non-patatin-like proteins (585 sequences) from different subspecies of Zea mays were analyzed. RBF and polynomial kernels were quite promising in the prediction of patatin-like proteins. Among six sequence-based compositional features, di-peptide composition attained > 90% prediction accuracies using RBF and polynomial kernels. Using mutual information, most explaining dipeptides that contributed the highest to the prediction process were identified. The knowledge generated in this study can be utilized in other crops prior to the initiation of any experiment. The developed SVM model opened a new paradigm for scientists working in in-vivo haploid induction in commercial crops. This is the first report of machine learning of the identification of proteins with patatin-like activity.

Authors

  • Suman Dutta
    ICAR-Indian Agricultural Research Institute, New Delhi, India.
  • Rajkumar U Zunjare
    ICAR-Indian Agricultural Research Institute, New Delhi, India.
  • Anirban Sil
    ICAR-Indian Agricultural Research Institute, New Delhi, India.
  • Dwijesh Chandra Mishra
    ICAR-Indian Agricultural Statistical Research Institute, New Delhi, India.
  • Alka Arora
    Division of Computer Applications, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, 110012, India.
  • Nisrita Gain
    ICAR-Indian Agricultural Research Institute, New Delhi, India.
  • Gulab Chand
    ICAR-Indian Agricultural Research Institute, New Delhi, India.
  • Rashmi Chhabra
    ICAR-Indian Agricultural Research Institute, New Delhi, India.
  • Vignesh Muthusamy
    ICAR-Indian Agricultural Research Institute, New Delhi, India.
  • Firoz Hossain
    ICAR-Indian Agricultural Research Institute, New Delhi, India. fh_gpb@yahoo.com.