Learning Consistent Semantic Representation for Chest X-ray via Anatomical Localization in Self-Supervised Pre-Training.
Journal:
IEEE journal of biomedical and health informatics
PMID:
40030350
Abstract
Despite the similar global structures in Chest X-ray (CXR) images, the same anatomy exhibits varying appearances across images, including differences in local textures, shapes, colors, etc. Learning consistent representations for anatomical semantics through these diverse appearances poses a great challenge for self-supervised pre-training in CXR images. To address this challenge, we propose two new pre-training tasks: inner-image anatomy localization (IIAL) and cross-image anatomy localization (CIAL). Leveraging the relatively stable positions of identical anatomy across images, we utilize position information directly as supervision to learn consistent semantic representations. Specifically, IIAL adopts a coarse-to-fine heatmap localization approach to correlate anatomical semantics with positions, while CIAL leverages feature affine alignment and heatmap localization to establish a correspondence between identical anatomical semantics across varying images, despite their appearance diversity. Furthermore, we introduce a unified end-to-end pre-training framework, anatomy-aware representation learning (AARL), integrating IIAL, CIAL, and a pixel restoration task. The advantages of AARL are: 1) preserving the appearance diversity and 2) training in a simple end-to-end way avoiding complicated preprocessing. Extensive experiments on six downstream tasks, including classification and segmentation tasks in various application scenarios, demonstrate that our AARL: 1) has more powerful representation and transferring ability; 2) is annotation-efficient, reducing the demand for labeled data and 3) improves the sensitivity to detecting various pathological and anatomical patterns.