Learning from label proportions on high-dimensional data.

Journal: Neural networks : the official journal of the International Neural Network Society
Published Date:

Abstract

Learning from label proportions (LLP), in which the training data is in the form of bags and only the proportion of each class in each bag is available, has attracted wide interest in machine learning. However, how to solve high-dimensional LLP problem is still a challenging task. In this paper, we propose a novel algorithm called learning from label proportions based on random forests (LLP-RF), which has the advantage of dealing with high-dimensional LLP problem. First, by defining the hidden class labels inside target bags as random variables, we formulate a robust loss function based on random forests and take the corresponding proportion information into LLP-RF by penalizing the difference between the ground truth and estimated label proportion. Second, a simple but efficient alternating annealing method is employed to solve the corresponding optimization model. At last, various experiments demonstrate that our algorithm can obtain the best accuracies on high-dimensional data compared with several recently developed methods.

Authors

  • Yong Shi
    Research Center on Fictitious Economy and Data Science, Chinese Academy of Sciences, Beijing 100190, China; Key Laboratory of Big Data Mining and Knowledge Management, Chinese Academy of Sciences, Beijing 100190, China; College of Information Science and Technology, University of Nebraska at Omaha, Omaha, NE 68182, USA. Electronic address: yshi@ucas.ac.cn.
  • Jiabin Liu
    School of Computer and Control Engineering, University of Chinese Academy Sciences, Beijing 100190, China.
  • Zhiquan Qi
    Laboratory of Vehicle Engineering, School of Mechanical Engineering, Beijing Institute of Technology, Beijing 100081, China.
  • Bo Wang
    Department of Clinical Laboratory Medicine Center, Inner Mongolia Autonomous Region People's Hospital, Hohhot, Inner Mongolia, China.