A novel hierarchical selective ensemble classifier with bioinformatics application.

Journal: Artificial intelligence in medicine
Published Date:

Abstract

Selective ensemble learning is a technique that selects a subset of diverse and accurate basic models in order to generate stronger generalization ability. In this paper, we proposed a novel learning algorithm that is based on parallel optimization and hierarchical selection (PTHS). Our novel feature selection method is based on maximize the sum of relevance and distance (MSRD) for solving the problem of high dimensionality. Specifically, we have a PTHS algorithm that employs parallel optimization and candidate model pruning based on k-means and a hierarchical selection framework. We combine the prediction result of each basic model by majority voting, which employs the divide-and-conquer strategy to save computing time. In addition, the PT algorithm is capable to transform a multi-class problem into a binary classification problem, and thereby allowing our ensemble model to address multi-class problems. Empirical study shows that MSRD is efficient in solving the high dimensionality problem, and PTHS exhibits better performance than the other existing classification algorithms. Most importantly, our classifier achieved high-level performance on several bioinformatics problems (e.g. tRNA identification, and protein-protein interaction prediction, etc.), demonstrating efficiency and robustness.

Authors

  • Leyi Wei
    School of Computer Science and Technology, Tianjin University, Tianjin, 30050, China.
  • Shixiang Wan
    School of Computer Science and Technology, Tianjin University, Tianjin, China.
  • Jiasheng Guo
    School of Information Science and Technology, Xiamen University, Xiamen 361005, China.
  • Kelvin Kl Wong
    School of Medicine, Western Sydney University, Sydney, Australia. Electronic address: Kelvin.Wong@westernsydney.edu.au.