Unleashing the Potential of Model Bias for Generalized Category Discovery
Journal:
arXiv
Published Date:
Dec 17, 2024
Abstract
Generalized Category Discovery is a significant and complex task that aims to
identify both known and undefined novel categories from a set of unlabeled
data, leveraging another labeled dataset containing only known categories. The
primary challenges stem from model bias induced by pre-training on only known
categories and the lack of precise supervision for novel ones, leading to
category bias towards known categories and category confusion among different
novel categories, which hinders models' ability to identify novel categories
effectively. To address these challenges, we propose a novel framework named
Self-Debiasing Calibration (SDC). Unlike prior methods that regard model bias
towards known categories as an obstacle to novel category identification, SDC
provides a novel insight into unleashing the potential of the bias to
facilitate novel category learning. Specifically, the output of the biased
model serves two key purposes. First, it provides an accurate modeling of
category bias, which can be utilized to measure the degree of bias and debias
the output of the current training model. Second, it offers valuable insights
for distinguishing different novel categories by transferring knowledge between
similar categories. Based on these insights, SDC dynamically adjusts the output
logits of the current training model using the output of the biased model. This
approach produces less biased logits to effectively address the issue of
category bias towards known categories, and generates more accurate pseudo
labels for unlabeled data, thereby mitigating category confusion for novel
categories. Experiments on three benchmark datasets show that SDC outperforms
SOTA methods, especially in the identification of novel categories. Our code
and data are available at \url{https://github.com/Lackel/SDC}.