Energy-Guided Temporal Segmentation Network for Multimodal Human Action Recognition.

Journal: Sensors (Basel, Switzerland)
Published Date:

Abstract

To achieve the satisfactory performance of human action recognition, a central task is to address the sub-action sharing problem, especially in similar action classes. Nevertheless, most existing convolutional neural network (CNN)-based action recognition algorithms uniformly divide video into frames and then randomly select the frames as inputs, ignoring the distinct characteristics among different frames. In recent years, depth videos have been increasingly used for action recognition, but most methods merely focus on the spatial information of the different actions without utilizing temporal information. In order to address these issues, a novel energy-guided temporal segmentation method is proposed here, and a multimodal fusion strategy is employed with the proposed segmentation method to construct an energy-guided temporal segmentation network (EGTSN). Specifically, the EGTSN had two parts: energy-guided video segmentation and a multimodal fusion heterogeneous CNN. The proposed solution was evaluated on a public large-scale NTU RGB+D dataset. Comparisons with state-of-the-art methods demonstrate the effectiveness of the proposed network.

Authors

  • Qiang Liu
    Blood Transfusion Laboratory, Jiangxi Provincial Blood Center Nanchang 330052, Jiangxi, China.
  • Enqing Chen
    School of Information Engineering, Zhengzhou University, Zhengzhou 450000, China.
  • Lei Gao
    Microscopy Core Facility, Biomedical Research Core Facilities, Westlake University, Hangzhou, China.
  • Chengwu Liang
    School of Information Engineering, Zhengzhou University, Zhengzhou 450000, China.
  • Hao Liu
    Key Laboratory of Development and Maternal and Child Diseases of Sichuan Province, Department of Pediatrics, Sichuan University, Chengdu, China.