Towards zero-shot human-object interaction detection via vision-language integration.

Journal: Neural networks : the official journal of the International Neural Network Society
PMID:

Abstract

Human-object interaction (HOI) detection aims to locate human-object pairs and identify their interaction categories in images. Most existing methods primarily focus on supervised learning, which relies on extensive manual HOI annotations. Such heavy reliance on closed-set supervised learning limits their generalization capabilities to unseen object categories. Inspired by the remarkable zero-shot capabilities of VLM, we propose a novel framework, termed Knowledge Integration to HOI (KI2HOI), that effectively integrates the knowledge of the visual-language model to improve zero-shot HOI detection. Specifically, we propose a ho-pair encoder to supplement contextual and interaction-specific semantic representation decoder into our model. Additionally, we propose two fusion strategies to facilitate prior knowledge transfer of VLM. One is visual-level fusion, producing more global context interaction features; another is language-level fusion, further enhancing the capability of VLM for HOI detection. Extensive experiments conducted on the mainstream HICO-DET and V-COCO datasets demonstrate that our model outperforms the previous methods in various zero-shot and full-supervised settings. The source code is available in https://github.com/xwyscut/K2HOI.

Authors

  • Weiying Xue
    School of Future Technology, South China University of Technology, Guangdong Guangzhou, 511400, PR China. Electronic address: 202320163283@mail.scut.edu.cn.
  • Qi Liu
    National Institute of Traditional Chinese Medicine Constitution and Preventive Medicine, Beijing University of Chinese Medicine, Beijing, China.
  • Yuxiao Wang
    School of Information Science and Engineering, Shandong University, Qingdao, Shandong 266237, P.R.China.
  • Zhenao Wei
    School of Future Technology, South China University of Technology, Guangdong Guangzhou, 511400, PR China. Electronic address: wza@scut.edu.cn.
  • Xiaofen Xing
  • Xiangmin Xu