DRTN: Dual Relation Transformer Network with feature erasure and contrastive learning for multi-label image classification.

Journal: Neural networks : the official journal of the International Neural Network Society

PMID: 40048756

Abstract

The objective of multi-label image classification (MLIC) task is to simultaneously identify multiple objects present in an image. Several researchers directly flatten 2D feature maps into 1D grid feature sequences, and utilize Transformer encoder to capture the correlations of grid features to learn object relationships. Although obtaining promising results, these Transformer-based methods lose spatial information. In addition, current attention-based models often focus only on salient feature regions, but ignore other potential useful features that contribute to MLIC task. To tackle these problems, we present a novel Dual Relation Transformer Network (DRTN) for MLIC task, which can be trained in an end-to-end manner. Concretely, to compensate for the loss of spatial information of grid features resulting from the flattening operation, we adopt a grid aggregation scheme to generate pseudo-region features, which does not need to make additional expensive annotations to train object detector. Then, a new dual relation enhancement (DRE) module is proposed to capture correlations between objects using two different visual features, thereby complementing the advantages provided by both grid and pseudo-region features. After that, we design a new feature enhancement and erasure (FEE) module to learn discriminative features and mine additional potential valuable features. By using attention mechanism to discover the most salient feature regions and removing them with region-level erasure strategy, our FEE module is able to mine other potential useful features from the remaining parts. Further, we devise a novel contrastive learning (CL) module to encourage the foregrounds of salient and potential features to be closer, while pushing their foregrounds further away from background features. This manner compels our model to learn discriminative and valuable features more comprehensively. Extensive experiments demonstrate that DRTN method surpasses current MLIC models on three challenging benchmarks, i.e., MS-COCO 2014, PASCAL VOC 2007, and NUS-WIDE datasets.

Authors

Wei Zhou

Department of Eye Function Laboratory, Eye Hospital, China Academy of Chinese Medical Sciences, Beijing, China.
Kang Lin
Zhijie Zheng

School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, 510006, Guangdong, China.
Dihu Chen

School of Electronics and Information Technology, Sun Yat-sen University, Guangzhou, 510006, Guangdong, China.
Tao Su

Department of Gastroenterology, The Sixth Affiliated Hospital, Sun Yat-Sen University, 26th Yuancun the Second Road, Guangzhou, 510655, Guangdong Province, China.
Haifeng Hu

School of Telecommunication and Information Engineering, Nanjing University of Posts and Telecommunications, Nanjing, China.

Keywords

Algorithms Humans Image Processing, Computer-Assisted Machine Learning Neural Networks, Computer Pattern Recognition, Automated

External Resources

View on PubMed Access via DOI PubMed (40048756)

DRTN: Dual Relation Transformer Network with feature erasure and contrastive learning for multi-label image classification.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals