ViroNia: LSTM based proteomics model for precise prediction of HCV.
Journal:
Computers in biology and medicine
PMID:
39733555
Abstract
Classification of viruses carries important implications in terms of understanding their evolution and the designing of interventions. This study introduces ViroNia as a novel LSTM-based system specifically meant for high-accuracy classification of viral proteins. Although originally developed for generative tasks, LSTM architectures have been found to be highly efficient for classification tasks as well; the model demonstrates this capability. It outperforms the deep architectures, such as Simple RNN, GRU, 1d CNN and Bidirectional LSTM, with the advantage of using pairwise sequence similarity and efficient data handling. ViroNia, with a dataset of 2250 protein sequences from both the NCBI and BVBRC databases, shows great performance at accuracy levels of 99.7 % and 99.6 % for broad as well as detail-level classifications, respectively. Cross-validation was carried out on the data provided for the fivefold strategy, achieving average accuracies of 92.29 % (±1.55 %) and 90.31 % (±5.41 %), respectively, at both the broad and detail level. The architecture allows for real-time data processing and automatic feature extraction, addressing the scalability limitations faced by BLAST (Basic Local Alignment Search Tool). The comparative analysis revealed that, although existing deep learning models share similar training parameters, ViroNia significantly enhanced classification outcomes. It finds specific applications in those areas that demand real-time analysis and learning on extra viral protein datasets, and hence, contributes broadly to ongoing viral research.