PlantPathoPPI: An Ensemble-based Machine Learning Architecture for Prediction of Protein-Protein Interactions between Plants and Pathogens.

Journal: Journal of molecular biology
Published Date:

Abstract

This study aimed to develop a machine learning-based tool for predicting protein-protein interactions (PPIs) between plant-pathogen systems, addressing the challenges of experimental PPI identification. Identifying PPIs in plant-pathogen interactions is crucial for understanding the molecular mechanisms underlying plant defense and pathogen virulence. However, experimental methods are time-consuming and labor-intensive, prompting the use of computational techniques to complement traditional approaches. A robust ensemble model was developed using multiple sequence encodings and diverse learning algorithms such as random forest, support vector machine, and artificial neural network. The features used included auto-covariance, conjoint triad, and local descriptor schemes, which were selected based on their performance. The top three performing models were combined into an ensemble model, improving prediction accuracy to approximately 97%. The PlantPathoPPI tool, developed through this approach, was compared with existing tools using an independent test dataset, showing promising potential for PPI prediction in plant-pathogen interactions. To facilitate broad accessibility, a web-based prediction server was developed, available at https://plantpathoppi.onrender.com/, alongside a Python package on https://pypi.org/project/plantpathoppi-ml/. This research contributes significantly to the field by offering an efficient tool for predicting PPIs in plant-pathogen systems, providing valuable insights into plant diseases and supporting hypothesis-driven research.

Authors

  • Sneha Murmu
    Division of Agricultural Bioinformatics, ICAR-Indian Agricultural Statistics Research Institute, New Delhi, India.
  • Himanshushekhar Chaurasia
    ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India; ICAR-Indian Agricultural Research Institute, New Delhi 110012, India; ICAR-Central Institute for Research on Cotton Technology, Mumbai 400019, India.
  • A R Rao
    Indian Council of Agricultural Research, New Delhi 110001, India.
  • Anil Rai
    Indian Council of Agricultural Research, New Delhi 110001, India.
  • Sarika Jaiswal
    ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India.
  • Anshu Bharadwaj
    ICAR-Indian Agricultural Statistics Research Institute, New Delhi 110012, India.
  • Rajbir Yadav
    ICAR-Indian Agricultural Research Institute, New Delhi 110012, India.
  • Sunil Archak
    ICAR-National Bureau of Plant Genetic Resources, New Delhi 110012, India. Electronic address: sunil.archak@icar.gov.in.