Enhancing Visual Analysis in Person Re-Identification With Vision-Language Models.
Journal:
IEEE computer graphics and applications
Published Date:
Jul 28, 2025
Abstract
Image-based person re-identification aims to match individuals across multiple cameras. Despite advances in machine learning, their effectiveness in real-world scenarios remains limited, often leaving users to handle fine-grained matching manually. Recent work has explored textual information as auxiliary cues, but existing methods generate coarse descriptions and fail to integrate them effectively into retrieval workflows. To address these issues, we adopt a vision-language model fine-tuned with domain-specific knowledge to generate detailed textual descriptions and keywords for pedestrian images. We then create a joint search space combining visual and textual information, using image clustering and keyword co-occurrence to build a semantic layout. Additionally, we introduce a dynamic spiral word cloud algorithm to improve visual presentation and enhance semantic associations. Finally, we conduct case studies, a user study, and expert feedback, demonstrating the usability and effectiveness of our system.
Authors
Keywords
No keywords available for this article.