Exploring visual language models as a powerful tool in the diagnosis of Ewing Sarcoma
Journal:
arXiv
Published Date:
Jan 14, 2025
Abstract
Ewing's sarcoma (ES), characterized by a high density of small round blue
cells without structural organization, presents a significant health concern,
particularly among adolescents aged 10 to 19. Artificial intelligence-based
systems for automated analysis of histopathological images are promising to
contribute to an accurate diagnosis of ES. In this context, this study explores
the feature extraction ability of different pre-training strategies for
distinguishing ES from other soft tissue or bone sarcomas with similar
morphology in digitized tissue microarrays for the first time, as far as we
know. Vision-language supervision (VLS) is compared to fully-supervised
ImageNet pre-training within a multiple instance learning paradigm. Our
findings indicate a substantial improvement in diagnostic accuracy with the
adaption of VLS using an in-domain dataset. Notably, these models not only
enhance the accuracy of predicted classes but also drastically reduce the
number of trainable parameters and computational costs.