Visual-guided attentive attributes embedding for zero-shot learning.

Journal: Neural networks : the official journal of the International Neural Network Society
Published Date:

Abstract

Zero-shot learning (ZSL) aims to learn a classifier for unseen classes by exploiting both training data from seen classes and external knowledge. In many visual tasks such as image classification, a set of high-level attributes that describe the semantic properties of classes are used as the external knowledge to bridge seen and unseen classes. While the attributes are usually treated equally by previous ZSL studies, we observe that the contribution of different attributes varies significantly over model training. To adaptively exploit the discriminative information embedded in different attributes, we propose a novel encoder-decoder framework with attention mechanism on the attribute level for zero-shot learning. Specifically, by mapping the visual features into a semantic space, the more discriminative attributes are emphasized with larger attention weights. Further, the attentive attributes and the class prototypes are simultaneously decoded to the visual space so that the hubness problem can be eased. Finally, the labels are predicted in the visual space. Extensive experiments on multiple benchmark datasets demonstrate that our proposed model achieves a significant boost over several state-of-the-art methods for ZSL task and comparative results for generalized ZSL task.

Authors

  • Rui Zhang
    Department of Cardiology, Zhongda Hospital, Medical School of Southeast University, Nanjing, China.
  • Qi Zhu
    Medical Research Center, Southwestern Hospital, Army Medical University, Chongqing 400037, P.R. China.
  • Xiangyu Xu
    College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing 211106, China.
  • Daoqiang Zhang
    College of Computer Science and Technology, Nanjing University of Aeronautics and Astronautics, Nanjing, China.
  • Sheng-Jun Huang