MSLVP: prediction of multiple subcellular localization of viral proteins using a support vector machine.

Journal: Molecular bioSystems
Published Date:

Abstract

Knowledge of the subcellular location (SCL) of viral proteins in the host cell is important for understanding their function in depth. Therefore, we have developed "MSLVP", a two-tier prediction algorithm for predicting multiple SCLs of viral proteins. For this study, data sets of comprehensive viral proteins with experimentally validated SCL annotation were collected from UniProt. Non-redundant (90%) data sets of 3480 viral proteins that belonged to single (2715), double (391) and multiple (374) sites were employed. Additionally, 1687 (30% sequence identity) viral proteins were categorised into single (1366), double (167) and multiple (154) sites. Single, double and multiple locations further comprised of eight, four and six categories, respectively. Viral protein locations include the nucleus, cytoplasm, endoplasmic reticulum, extracellular, single-pass membrane, multi-pass membrane, capsid, remaining others and combinations thereof. Support vector machine based models were developed using sequence features like amino acid composition, dipeptide composition, physicochemical properties and their hybrids. We have employed "one-versus-one" as well as "one-versus-other" strategies for multiclass classification. The performance of "one-versus-one" is better than the "one-versus-other" approach during 10-fold cross-validation. For the 90% data set, we achieved an accuracy, a Matthew's correlation coefficient (MCC) and a receiver operating characteristic (ROC) of 99.99%, 1.00, 1.00; 100.00%, 1.00, 1.00 and 99.90%; 1.00, 1.00 for single, double and multiple locations, respectively. Similar results were achieved for a 30% sequence identity data set. Predictive models for each SCL performed equally well on the independent dataset. The MSLVP web server () can predict subcellular locations i.e. single (8; including single and multi-pass membrane), double (4) and multiple (6). This would be helpful for elucidating the functional annotation of viral proteins and potential drug targets.

Authors

  • Anamika Thakur
    Virology Unit, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Sector 39A, Chandigarh 160036, India.
  • Akanksha Rajput
    Virology Unit, Institute of Microbial Technology, Council of Scientific and Industrial Research (CSIR), Sector 39A, Chandigarh 160036, India.
  • Manoj Kumar
    Department of Pharmaceutical Sciences and Drug Research, Punjabi University Patiala Punjab 147002 India mmlpup73@gmail.com +91 17522 83075 +91 95015 42696.