Toward Disentangled and Controllable Deep Metric Learning With Human-Like Concept Decomposition.

Journal: IEEE transactions on neural networks and learning systems
Published Date:

Abstract

Deep metric learning (DML) has shown significant advancements in learning discriminative embeddings for images, playing a crucial role in various vision tasks. However, existing methods typically rely on deep neural networks to extract holistic embeddings, which are challenging to disentangle and interpret. To address this issue, we take inspiration from human cognition, where objects are decomposed into distinct concepts for better understanding. Specifically, we propose the concept metrics network (CMNs) to achieve disentangled and controllable DML. CMN begins by initializing learnable concept vectors to represent various visual concepts. These vectors are then associated with regional visual features via cross-attention mechanism, ensuring each vector corresponds to specific visual properties. Finally, the concept values, determined by their presence in the image, form the output embedding. Comprehensive experiments demonstrate that CMN effectively disentangles visual concepts, with each embedding dimension corresponding to a specific concept. Our method not only outperforms existing state-of-the-art methods in conventional DML application (i.e., image retrieval), but also enables more flexible and controllable application. The code is available at https://github.com/shchen0001/CMN.

Authors

  • Shuhuang Chen
  • Shiming Chen
  • Shuo Ye
  • Yuetian Wang
  • Xinge You
    School of Electronic Information and Communications, Huazhong University of Science and Technology, Wuhan, China; Shenzhen Huazhong University of Science and Technology Research Institute, China.

Keywords

No keywords available for this article.