Multipe/single-view human action recognition via part-induced multitask structural learning.

Journal: IEEE transactions on cybernetics
Published Date:

Abstract

This paper proposes a unified framework for multiple/single-view human action recognition. First, we propose the hierarchical partwise bag-of-words representation which encodes both local and global visual saliency based on the body structure cue. Then, we formulate the multiple/single-view human action recognition as a part-regularized multitask structural learning (MTSL) problem which has two advantages on both model learning and feature selection: 1) preserving the consistence between the body-based action classification and the part-based action classification with the complementary information among different action categories and multiple views and 2) discovering both action-specific and action-shared feature subspaces to strengthen the generalization ability of model learning. Moreover, we contribute two novel human action recognition datasets, TJU (a single-view multimodal dataset) and MV-TJU (a multiview multimodal dataset). The proposed method is validated on three kinds of challenging datasets, including two single-view RGB datasets (KTH and TJU), two well-known depth dataset (MSR action 3-D and MSR daily activity 3-D), and one novel multiview multimodal dataset (MV-TJU). The extensive experimental results show that this method can outperform the popular 2-D/3-D part model-based methods and several other competing methods for multiple/single-view human action recognition in both RGB and depth modalities. To our knowledge, this paper is the first to demonstrate the applicability of MTSL with part-based regularization on multiple/single-view human action recognition in both RGB and depth modalities.

Authors

  • An-An Liu
  • Yu-Ting Su
  • Ping-Ping Jia
  • Zan Gao
  • Tong Hao
  • Zhao-Xuan Yang