APLpred: A machine learning-based tool for accurate prediction and characterization of asparagine peptide lyases using sequence-derived optimal features.

Journal: Methods (San Diego, Calif.)
Published Date:

Abstract

Asparagine peptide lyase (APL) is among the seven groups of proteases, also known as proteolytic enzymes, which are classified according to their catalytic residue. APLs are synthesized as precursors or propeptides that undergo self-cleavage through autoproteolytic reaction. At present, APLs are grouped into 10 families belonging to six different clans of proteases. Recognizing their critical roles in many biological processes including virus maturation, and virulence, accurate identification and characterization of APLs is indispensable. Experimental identification and characterization of APLs is laborious and time-consuming. Here, we developed APLpred, a novel support vector machine (SVM) based predictor that can predict APLs from the primary sequences. APLpred was developed using Boruta-based optimal features derived from seven encodings and subsequently trained using five machine learning algorithms. After evaluating each model on an independent dataset, we selected APLpred (an SVM-based model) due to its consistent performance during cross-validation and independent evaluation. We anticipate APLpred will be an effective tool for identifying APLs. This could aid in designing inhibitors against these enzymes and exploring their functions. The APLpred web server is freely available at https://procarb.org/APLpred/.

Authors

  • Adeel Malik
    Department of Microbiology and Molecular Biology, College of Bioscience and Biotechnology, Chungnam National University, Daejeon 34134, Korea. adeel@procarb.org.
  • Majid Rasool Kamli
    Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia.
  • Jamal S M Sabir
    Department of Biological Sciences, Faculty of Science, King Abdulaziz University, Jeddah 21589, Saudi Arabia; Center of Excellence in Bionanoscience Research, King Abdulaziz University, Jeddah 21589, Saudi Arabia. Electronic address: jsabir@kau.edu.sa.
  • Irfan A Rather
    Department of Applied Microbiology and Biotechnology, Yeungnam University Gyeongsan, South Korea.
  • Le Thi Phan
    Computational Biology and Bioinformatics Laboratory, Department of Integrative Biotechnology, College of Biotechnology and Bioengineering, Sungkyunkwan University, Suwon 16419, Gyeonggi-do, Republic of Korea.
  • Chang-Bae Kim
    Department of Biotechnology, Sangmyung University, Seoul, 03016, Korea. evodevo@smu.ac.kr.
  • Balachandran Manavalan
    Department of Physiology, Ajou University School of Medicine, Suwon, Republic of Korea.