VLCD: Vision-Language Contrastive Distillation for Accurate and Efficient Automatic Placenta Analysis
Journal:
arXiv
Published Date:
Jun 2, 2025
Abstract
Pathological examination of the placenta is an effective method for detecting
and mitigating health risks associated with childbirth. Recent advancements in
AI have enabled the use of photographs of the placenta and pathology reports
for detecting and classifying signs of childbirth-related pathologies. However,
existing automated methods are computationally extensive, which limits their
deployability. We propose two modifications to vision-language contrastive
learning (VLC) frameworks to enhance their accuracy and efficiency: (1)
text-anchored vision-language contrastive knowledge distillation (VLCD)-a new
knowledge distillation strategy for medical VLC pretraining, and (2)
unsupervised predistillation using a large natural images dataset for improved
initialization. Our approach distills efficient neural networks that match or
surpass the teacher model in performance while achieving model compression and
acceleration. Our results showcase the value of unsupervised predistillation in
improving the performance and robustness of our approach, specifically for
lower-quality images. VLCD serves as an effective way to improve the efficiency
and deployability of medical VLC approaches, making AI-based healthcare
solutions more accessible, especially in resource-constrained environments.