A simple and reliable instance selection for fast training support vector machine: Valid Border Recognition.

Journal: Neural networks : the official journal of the International Neural Network Society
Published Date:

Abstract

Support vector machines (SVMs) are powerful statistical learning tools, but their application to large datasets can cause time-consuming training complexity. To address this issue, various instance selection (IS) approaches have been proposed, which choose a small fraction of critical instances and screen out others before training. However, existing methods have not been able to balance accuracy and efficiency well. Some methods miss critical instances, while others use complicated selection schemes that require even more execution time than training with all original instances, thus violating the initial intention of IS. In this work, we present a newly developed IS method called Valid Border Recognition (VBR). VBR selects the closest heterogeneous neighbors as valid border instances and incorporates this process into the creation of a reduced Gaussian kernel matrix, thus minimizing the execution time. To improve reliability, we propose a strengthened version of VBR (SVBR). Based on VBR, SVBR gradually adds farther heterogeneous neighbors as complements until the Lagrange multipliers of already selected instances become stable. In numerical experiments, the effectiveness of our proposed methods is verified on benchmark and synthetic datasets in terms of accuracy, execution time and inference time.

Authors

  • Long Tang
    Research Institute of Extenics and Innovation Method, Guangdong University of Technology, Guangzhou, 510006, China; Center for Applied Optimization, Department of Industrial and Systems Engineering, University of Florida, Gainesville, 32611, USA.
  • Yingjie Tian
    Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing 100190, China; Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing 100190, China. Electronic address: tyj@ucas.ac.cn.
  • Xiaowei Wang
    Beijing Centers for Preventive Medical Research, Beijing 100013, China.
  • Panos M Pardalos
    Center for Applied Optimization, Department of Industrial and Systems Engineering, University of Florida, Gainesville, 32611, USA.