From Embeddings to Accuracy: Comparing Foundation Models for Radiographic Classification
Journal:
arXiv
Published Date:
May 16, 2025
Abstract
Foundation models, pretrained on extensive datasets, have significantly
advanced machine learning by providing robust and transferable embeddings
applicable to various domains, including medical imaging diagnostics. This
study evaluates the utility of embeddings derived from both general-purpose and
medical domain-specific foundation models for training lightweight adapter
models in multi-class radiography classification, focusing specifically on tube
placement assessment. A dataset comprising 8842 radiographs classified into
seven distinct categories was employed to extract embeddings using six
foundation models: DenseNet121, BiomedCLIP, Med-Flamingo, MedImageInsight,
Rad-DINO, and CXR-Foundation. Adapter models were subsequently trained using
classical machine learning algorithms. Among these combinations,
MedImageInsight embeddings paired with an support vector machine adapter
yielded the highest mean area under the curve (mAUC) at 93.8%, followed closely
by Rad-DINO (91.1%) and CXR-Foundation (89.0%). In comparison, BiomedCLIP and
DenseNet121 exhibited moderate performance with mAUC scores of 83.0% and 81.8%,
respectively, whereas Med-Flamingo delivered the lowest performance at 75.1%.
Notably, most adapter models demonstrated computational efficiency, achieving
training within one minute and inference within seconds on CPU, underscoring
their practicality for clinical applications. Furthermore, fairness analyses on
adapters trained on MedImageInsight-derived embeddings indicated minimal
disparities, with gender differences in performance within 2% and standard
deviations across age groups not exceeding 3%. These findings confirm that
foundation model embeddings-especially those from MedImageInsight-facilitate
accurate, computationally efficient, and equitable diagnostic classification
using lightweight adapters for radiographic image analysis.