Analysis of protein determinants of genotype-specific properties of group a rotaviruses using machine learning.

Journal: Computers in biology and medicine
PMID:

Abstract

Group A rotaviruses (RVAs) are the leading cause of viral diarrhoea across various host species, including mammals and birds. The VP7 and VP4 proteins of these viruses play critical roles in determining genotype specificity, influencing viral infectivity and host adaptation. This study employed machine-learning techniques to classify RVA genotypes based on the molecular and physicochemical properties of these proteins. A dataset of 94 VP7 and 68 VP4 protein sequences was collected from various host species. Seven machine-learning algorithms-Naïve Bayes (NB), logistic regression (LR), decision tree (DT), random forest (RF), k-nearest neighbour (kNN), support vector machine (SVM), and artificial neural network (ANN)-were used for genotype classification. Feature subsets were configured using ranking-based attribute selection, and classification performance was evaluated using accuracy (ACC), precision, recall, Matthews' correlation coefficient (MCC), and the area under the curve (AUC). kNN demonstrated the highest classification accuracy for both VP7 (ACC = 97.87 %) and VP4 (ACC = 100 %), outperforming NB, LR, DT, RF, SVM, and ANN. For VP7 sequences, key properties influencing genotype classification included hydrophobicity, normalised van der Waals volume, and leucine composition. For VP4, polarity, normalised van der Waals volume, and polarizability were the most significant factors. In summary, the genotype-specific molecular features of VP7 and VP4 proteins served as reliable markers for RVA classification. Our findings highlight the potential of machine-learning approaches to predict RVA genotypes based on the physicochemical properties of amino acids, providing valuable insights into the molecular mechanisms that drive viral evolution, host specificity, and immune evasion.

Authors

  • Myeongji Cho
    Laboratory of Computational Virology & Viroinformatics, Graduate School of Public Health, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Korea.
  • Nara Been
    Laboratory of Computational Virology & Viroinformatics, Graduate School of Public Health, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea; Public Health AI Lab, Graduate School of Public Health, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, South Korea.
  • Hyeon S Son
    Laboratory of Computational Virology & Viroinformatics, Graduate School of Public Health, Seoul National University, 1 Gwanak-ro, Gwanak-gu, Seoul, 08826, Korea. hss2003@snu.ac.kr.