A Multi-Level Relation-Aware Transformer model for occluded person re-identification.

Journal: Neural networks : the official journal of the International Neural Network Society
PMID:

Abstract

Occluded person re-identification (Re-ID) is a challenging task, as pedestrians are often obstructed by various occlusions, such as non-pedestrian objects or non-target pedestrians. Previous methods have heavily relied on auxiliary models to obtain information in unoccluded regions, such as human pose estimation. However, these auxiliary models fall short in accounting for pedestrian occlusions, thereby leading to potential misrepresentations. In addition, some previous works learned feature representations from single images, ignoring the potential relations among samples. To address these issues, this paper introduces a Multi-Level Relation-Aware Transformer (MLRAT) model for occluded person Re-ID. This model mainly encompasses two novel modules: Patch-Level Relation-Aware (PLRA) and Sample-Level Relation-Aware (SLRA). PLRA learns fine-grained local features by modeling the structural relations between key patches, bypassing the dependency on auxiliary models. It adopts a model-free method to select key patches that have high semantic correlation with the final pedestrian representation. In particular, to alleviate the interference of occlusion, PLRA captures the structural relations among key patches via a two-layer Graph Convolution Network (GCN), effectively guiding the local feature fusion and learning. SLRA is designed to facilitate the model to learn discriminative features by modeling the relations among samples. Specifically, to mitigate noisy relations of irrelevant samples, we present a Relation-Aware Transformer (RAT) block to capture the relations among neighbors. Furthermore, to bridge the gap between training and testing phases, a self-distillation method is employed to transfer the sample-level relations captured by SLRA to the backbone. Extensive experiments are conducted on four occluded datasets, two partial datasets and two holistic datasets. The results show that the proposed MLRAT model significantly outperforms existing baselines on four occluded datasets, while maintains top performance on two partial datasets and two holistic datasets.

Authors

  • Guorong Lin
    School of Artificial Intelligence, South China Normal University, Foshan 528225, China. Electronic address: linguorong@m.scnu.edu.cn.
  • Zhiqiang Bao
    School of Computer Science, South China Normal University, Guangzhou 510631, China. Electronic address: ZhiqiangBAO1995@163.com.
  • Zhenhua Huang
    School of Computer Science, South China Normal University, Guangzhou, China. Electronic address: huangzhenhua@m.scnu.edu.cn.
  • Zuoyong Li
    Fujian Provincial Key Laboratory of Information Processing and Intelligent Control, Minjiang University, Fuzhou, 350121, Fujian, China.
  • Wei-Shi Zheng
    School of Information Science and Technology, Sun Yat-sen University, Guangzhou 510006, China; Guangdong Province Key Laboratory of Computational Science, Guangzhou 510275, China. Electronic address: wszheng@ieee.org.
  • Yunwen Chen
    Research and Development Department, DataGrand Inc., Shanghai 201203, China. Electronic address: chenyunwen@datagrand.com.