Realistic Evaluation of Deep Partial-Label Learning Algorithms
Journal:
arXiv
Published Date:
Feb 14, 2025
Abstract
Partial-label learning (PLL) is a weakly supervised learning problem in which
each example is associated with multiple candidate labels and only one is the
true label. In recent years, many deep PLL algorithms have been developed to
improve model performance. However, we find that some early developed
algorithms are often underestimated and can outperform many later algorithms
with complicated designs. In this paper, we delve into the empirical
perspective of PLL and identify several critical but previously overlooked
issues. First, model selection for PLL is non-trivial, but has never been
systematically studied. Second, the experimental settings are highly
inconsistent, making it difficult to evaluate the effectiveness of the
algorithms. Third, there is a lack of real-world image datasets that can be
compatible with modern network architectures. Based on these findings, we
propose PLENCH, the first Partial-Label learning bENCHmark to systematically
compare state-of-the-art deep PLL algorithms. We investigate the model
selection problem for PLL for the first time, and propose novel model selection
criteria with theoretical guarantees. We also create Partial-Label CIFAR-10
(PLCIFAR10), an image dataset of human-annotated partial labels collected from
Amazon Mechanical Turk, to provide a testbed for evaluating the performance of
PLL algorithms in more realistic scenarios. Researchers can quickly and
conveniently perform a comprehensive and fair evaluation and verify the
effectiveness of newly developed algorithms based on PLENCH. We hope that
PLENCH will facilitate standardized, fair, and practical evaluation of PLL
algorithms in the future.