CONCORD: Concept-Informed Diffusion for Dataset Distillation

Journal: arXiv

Published Date: May 23, 2025

Abstract

Dataset distillation (DD) has witnessed significant progress in creating small datasets that encapsulate rich information from large original ones. Particularly, methods based on generative priors show promising performance, while maintaining computational efficiency and cross-architecture generalization. However, the generation process lacks explicit controllability for each sample. Previous distillation methods primarily match the real distribution from the perspective of the entire dataset, whereas overlooking concept completeness at the instance level. The missing or incorrectly represented object details cannot be efficiently compensated due to the constrained sample amount typical in DD settings. To this end, we propose incorporating the concept understanding of large language models (LLMs) to perform Concept-Informed Diffusion (CONCORD) for dataset distillation. Specifically, distinguishable and fine-grained concepts are retrieved based on category labels to inform the denoising process and refine essential object details. By integrating these concepts, the proposed method significantly enhances both the controllability and interpretability of the distilled image generation, without relying on pre-trained classifiers. We demonstrate the efficacy of CONCORD by achieving state-of-the-art performance on ImageNet-1K and its subsets. The code implementation is released in https://github.com/vimar-gu/CONCORD.

Authors

Jianyang Gu
Haonan Wang
Ruoxi Jia
Saeed Vahidian
Vyacheslav Kungurtsev
Wei Jiang
Yiran Chen

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2505.18358v1)

CONCORD: Concept-Informed Diffusion for Dataset Distillation

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

CONCORD: Concept-Informed Diffusion for Dataset Distillation

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals