EIOFX-DT: Leveraging graph centrality metrics for feature extraction and classification of viral genetic sequences.

Journal: Biotechnology reports (Amsterdam, Netherlands)
Published Date:

Abstract

Many diseases have a genetic origin, and analyzing intracellular structures through genetic data yields specific features for the diagnosis and classification of viral disease samples. In this study, 30 types of viruses were analyzed using a graph-based approach on genetic data. Genetic data has been modeled in the form of genomic sequences at the nucleotide scale using the graph theory of complex networks concepts. Degree and eigenvector centrality metrics were employed to extract features. The decision tree was utilized as a machine learning classifier algorithm on the resulting feature space. The results, presented in the form of interpretable rules, enable the classification and identification of virus types from both a binary and multi-class perspective. The model achieved high accuracy and f1 score, which exceeded 99 % on >173,000 samples. Additionally, the feature extraction algorithm demonstrated robust performance across all datasets and classifiers.

Authors

Keywords

No keywords available for this article.