Cascaded Parsing of Human-Object Interaction Recognition.

Journal: IEEE transactions on pattern analysis and machine intelligence
Published Date:

Abstract

This paper addresses the task of detecting and recognizing human-object interactions (HOI) in images. Considering the intrinsic complexity and structural nature of the task, we introduce a cascaded parsing network (CP-HOI) for a multi-stage, structured HOI understanding. At each cascade stage, an instance detection module progressively refines HOI proposals and feeds them into a structured interaction reasoning module. Each of the two modules is also connected to its predecessor in the previous stage, enabling efficient cross-stage information propagation. The structured interaction reasoning module is built upon a graph parsing neural network (GPNN), which efficiently models potential HOI structures as graphs and mines rich context for comprehensive relation understanding. In particular, GPNN infers a parse graph that i) interprets meaningful HOI structures by a learnable adjacency matrix, and ii) predicts action (edge) labels. Within an end-to-end, message-passing framework, GPNN blends learning and inference, iteratively parsing HOI structures and reasoning HOI representations (i.e., instance and relation features). Further beyond relation detection at a bounding-box level, we make our framework flexible to perform fine-grained pixel-wise relation segmentation; this provides a new glimpse into better relation modeling. A preliminary version of our CP-HOI model reached 1 place in the ICCV2019 Person in Context Challenge, on both relation detection and segmentation. In addition, our CP-HOI shows promising results on two popular HOI recognition benchmarks, i.e., V-COCO and HICO-DET.

Authors

  • Tianfei Zhou
  • Siyuan Qi
    Department of Computer Science, UCLA, Los Angeles, CA 90095, USA.
  • Wenguan Wang
  • Jianbing Shen
    Inception Institute of Artificial Intelligence, Abu Dhabi, United Arab Emirates.
  • Song-Chun Zhu
    Department of Computer Science, UCLA, Los Angeles, CA 90095, USA. markedmonds@ucla.edu yixin.zhu@ucla.edu sczhu@stat.ucla.edu.