GCapNet-FSD: A heterogeneous Graph Capsule Network for Few-Shot object Detection.

Journal: Neural networks : the official journal of the International Neural Network Society
Published Date:

Abstract

Few-shot object detection is a challenging task that aims to quickly adapt detectors to detect novel objects with only a minimal number of annotated examples. Although promising results have been achieved, performance still declines significantly when the number of shots decreases sharply. We argue that this shot sensitivity is due to the critical under-utilization of both internal few-shot data and external common knowledge bases. Therefore, the key insight is how to extract more discriminative notions to compensate for the insufficient task-specific information from the limited novel dataset. We propose a novel heterogeneous Graph Capsule Network for Few-Shot object Detection, named GCapNet-FSD. Specifically, we design a heterogeneous graph to combine the high-level visual capsule neurons from internal few-shot data and the stable semantic embeddings from the external easily available corpus for more discriminative task-specific representations. As a result, our proposed GCapNet-FSD is stable and robust for various settings of the shots. Our design outperforms current works in 1-shot of any split, with up to +3.7% on PASCAL VOC07&12 and +0.4% on challenging COCO benchmark, and extensive experiments on both PASCAL VOC07&12 and MS COCO benchmarks demonstrate that our GCapNet-FSD shows shot-stable detection performance and achieves significantly better performance at lower shots.

Authors

  • Jiaxu Leng
    School of Computer Science and Technology, University of Chinese Academy of Sciences, China.
  • Qianru Chen
    School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China; Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China; Chongqing Institute for Brain and Intelligence, Guangyang Bay Laboratory, Chongqing, 400064, China.
  • Taiyue Chen
    School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China; Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China; Chongqing Institute for Brain and Intelligence, Guangyang Bay Laboratory, Chongqing, 400064, China.
  • Feng Gao
    Department of Statistics, UCLA, Los Angeles, CA 90095, USA.
  • Ji Gan
    School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China; Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China; Chongqing Institute for Brain and Intelligence, Guangyang Bay Laboratory, Chongqing, 400064, China.
  • Changjun Gu
    School of Computer Science and Technology, Chongqing University of Posts and Telecommunications, Chongqing, China; Chongqing Key Laboratory of Image Cognition, Chongqing University of Posts and Telecommunications, Chongqing, 400065, China; Chongqing Institute for Brain and Intelligence, Guangyang Bay Laboratory, Chongqing, 400064, China.
  • Xinbo Gao