Skin Lesion Phenotyping via Nested Multi-modal Contrastive Learning
Journal:
arXiv
Published Date:
May 29, 2025
Abstract
We introduce SLIMP (Skin Lesion Image-Metadata Pre-training) for learning
rich representations of skin lesions through a novel nested contrastive
learning approach that captures complex relationships between images and
metadata. Melanoma detection and skin lesion classification based solely on
images, pose significant challenges due to large variations in imaging
conditions (lighting, color, resolution, distance, etc.) and lack of clinical
and phenotypical context. Clinicians typically follow a holistic approach for
assessing the risk level of the patient and for deciding which lesions may be
malignant and need to be excised, by considering the patient's medical history
as well as the appearance of other lesions of the patient. Inspired by this,
SLIMP combines the appearance and the metadata of individual skin lesions with
patient-level metadata relating to their medical record and other clinically
relevant information. By fully exploiting all available data modalities
throughout the learning process, the proposed pre-training strategy improves
performance compared to other pre-training strategies on downstream skin
lesions classification tasks highlighting the learned representations quality.