Conformal uncertainty quantification to evaluate predictive fairness of foundation AI model for skin lesion classes across patient demographics
Journal:
arXiv
Published Date:
Mar 31, 2025
Abstract
Deep learning based diagnostic AI systems based on medical images are
starting to provide similar performance as human experts. However these data
hungry complex systems are inherently black boxes and therefore slow to be
adopted for high risk applications like healthcare. This problem of lack of
transparency is exacerbated in the case of recent large foundation models,
which are trained in a self supervised manner on millions of data points to
provide robust generalisation across a range of downstream tasks, but the
embeddings generated from them happen through a process that is not
interpretable, and hence not easily trustable for clinical applications. To
address this timely issue, we deploy conformal analysis to quantify the
predictive uncertainty of a vision transformer (ViT) based foundation model
across patient demographics with respect to sex, age and ethnicity for the
tasks of skin lesion classification using several public benchmark datasets.
The significant advantage of this method is that conformal analysis is method
independent and it not only provides a coverage guarantee at population level
but also provides an uncertainty score for each individual. We used a
model-agnostic dynamic F1-score-based sampling during model training, which
helped to stabilize the class imbalance and we investigate the effects on
uncertainty quantification (UQ) with or without this bias mitigation step. Thus
we show how this can be used as a fairness metric to evaluate the robustness of
the feature embeddings of the foundation model (Google DermFoundation) and thus
advance the trustworthiness and fairness of clinical AI.