Gene Classification Based on Multi-Class SVMs with Systematic Sampling and Hierarchical Clustering (SSHC) Algorithm.

Journal: Advances in experimental medicine and biology
Published Date:

Abstract

The support vector machines (SVMs) is one of the machine learning algorithms with high classification accuracy. However, the support vector machine algorithm has a very high training complexity. Thus, it is not very efficient with large datasets. In this study, we have used the multi-class support vector machines and systematic sampling with hierarchical clustering (SSHC-MCSVM) algorithm for gene expression data classification. The gene expression profiles are considered as large datasets. The gene expression datasets that are used in this study are two datasets for obese and lean individuals. In this proposed (SSHC-MCSVM) algorithm, the gene expression data are regrouped to new sets of genes based on systematic sampling with hierarchical clustering (SSHC) algorithm. The SSHC algorithm repeated n times and the k-partitions with clusters that have high adjusted Rand index (ARI) are chosen. The multi-class support vector machines are applied to the best regrouped gene expression data to classify the significant genes. The performance measures are accuracy, recall, and precision. The proposed algorithm which is SSHC-MCSVM could classify the significant genes with high accuracy, recall, and precision.

Authors

  • Nwayyin Najat Mohammed
    University of Sulaimani, Collage of Science, Computer Department, Sulaymaniyah, Iraq. nawing1@gmail.com.