Toward Disentangled and Controllable Deep Metric Learning With Human-Like Concept Decomposition.
Journal:
IEEE transactions on neural networks and learning systems
Published Date:
Jul 30, 2025
Abstract
Deep metric learning (DML) has shown significant advancements in learning discriminative embeddings for images, playing a crucial role in various vision tasks. However, existing methods typically rely on deep neural networks to extract holistic embeddings, which are challenging to disentangle and interpret. To address this issue, we take inspiration from human cognition, where objects are decomposed into distinct concepts for better understanding. Specifically, we propose the concept metrics network (CMNs) to achieve disentangled and controllable DML. CMN begins by initializing learnable concept vectors to represent various visual concepts. These vectors are then associated with regional visual features via cross-attention mechanism, ensuring each vector corresponds to specific visual properties. Finally, the concept values, determined by their presence in the image, form the output embedding. Comprehensive experiments demonstrate that CMN effectively disentangles visual concepts, with each embedding dimension corresponding to a specific concept. Our method not only outperforms existing state-of-the-art methods in conventional DML application (i.e., image retrieval), but also enables more flexible and controllable application. The code is available at https://github.com/shchen0001/CMN.
Authors
Keywords
No keywords available for this article.