Cascade interpolation learning with double subspaces and confidence disturbance for imbalanced problems.

Journal: Neural networks : the official journal of the International Neural Network Society
Published Date:

Abstract

In this paper, a new ensemble framework named Cascade Interpolation Learning with Double subspaces and Confidence disturbance (CILDC) is designed for the imbalanced classification problems. Developed from the Cascade Forest of the Deep Forest which is the stacking based tree ensembles for big data issues with less hyper-parameters, CILDC aims to generalize the cascade model for more base classifiers. Specifically, CILDC integrates base classifiers through the double subspaces strategy and the random under-sampling preprocessing. Further, one simple but effective confidence disturbance technique is introduced to CILDC to tune the threshold deviation for imbalanced samples. In detail, the disturbance coefficients are multiplied to various confidence vectors before interpolating in each level of CILDC, and the ideal threshold can be adaptively learned through the cascade structure. Furthermore, both the Random Forest and the Naive Bayes are suitable to be the base classifier for CILDC. Subsequently, comprehensive comparison experiments on typical imbalanced datasets demonstrate both the effectiveness and generalization of CILDC.

Authors

  • Zhe Wang
    Department of Pathology, The Eighth Affiliated Hospital, Sun Yat-sen University, Shenzhen 518033, China.
  • Chenjie Cao
    Department of Computer Science and Engineering, East China University of Science and Technology, Shanghai, 200237, China.