Conformal Prediction for Long-Tailed Classification
Journal:
arXiv
Published Date:
Jul 9, 2025
Abstract
Many real-world classification problems, such as plant identification, have
extremely long-tailed class distributions. In order for prediction sets to be
useful in such settings, they should (i) provide good class-conditional
coverage, ensuring that rare classes are not systematically omitted from the
prediction sets, and (ii) be a reasonable size, allowing users to easily verify
candidate labels. Unfortunately, existing conformal prediction methods, when
applied to the long-tailed setting, force practitioners to make a binary choice
between small sets with poor class-conditional coverage or sets with very good
class-conditional coverage but that are extremely large. We propose methods
with guaranteed marginal coverage that smoothly trade off between set size and
class-conditional coverage. First, we propose a conformal score function,
prevalence-adjusted softmax, that targets a relaxed notion of class-conditional
coverage called macro-coverage. Second, we propose a label-weighted conformal
prediction method that allows us to interpolate between marginal and
class-conditional conformal prediction. We demonstrate our methods on Pl@ntNet
and iNaturalist, two long-tailed image datasets with 1,081 and 8,142 classes,
respectively.