Extending Dataset Pruning to Object Detection: A Variance-based Approach
Journal:
arXiv
Published Date:
May 22, 2025
Abstract
Dataset pruning -- selecting a small yet informative subset of training data
-- has emerged as a promising strategy for efficient machine learning, offering
significant reductions in computational cost and storage compared to
alternatives like dataset distillation. While pruning methods have shown strong
performance in image classification, their extension to more complex computer
vision tasks, particularly object detection, remains relatively underexplored.
In this paper, we present the first principled extension of classification
pruning techniques to the object detection domain, to the best of our
knowledge. We identify and address three key challenges that hinder this
transition: the Object-Level Attribution Problem, the Scoring Strategy Problem,
and the Image-Level Aggregation Problem. To overcome these, we propose tailored
solutions, including a novel scoring method called Variance-based Prediction
Score (VPS). VPS leverages both Intersection over Union (IoU) and confidence
scores to effectively identify informative training samples specific to
detection tasks. Extensive experiments on PASCAL VOC and MS COCO demonstrate
that our approach consistently outperforms prior dataset pruning methods in
terms of mean Average Precision (mAP). We also show that annotation count and
class distribution shift can influence detection performance, but selecting
informative examples is a more critical factor than dataset size or balance.
Our work bridges dataset pruning and object detection, paving the way for
dataset pruning in complex vision tasks.