Efficient Training of Deep Networks using Guided Spectral Data Selection: A Step Toward Learning What You Need
Journal:
arXiv
Published Date:
Jul 6, 2025
Abstract
Effective data curation is essential for optimizing neural network training.
In this paper, we present the Guided Spectrally Tuned Data Selection (GSTDS)
algorithm, which dynamically adjusts the subset of data points used for
training using an off-the-shelf pre-trained reference model. Based on a
pre-scheduled filtering ratio, GSTDS effectively reduces the number of data
points processed per batch. The proposed method ensures an efficient selection
of the most informative data points for training while avoiding redundant or
less beneficial computations. Preserving data points in each batch is performed
based on spectral analysis. A Fiedler vector-based scoring mechanism removes
the filtered portion of the batch, lightening the resource requirements of the
learning. The proposed data selection approach not only streamlines the
training process but also promotes improved generalization and accuracy.
Extensive experiments on standard image classification benchmarks, including
CIFAR-10, Oxford-IIIT Pet, and Oxford-Flowers, demonstrate that GSTDS
outperforms standard training scenarios and JEST, a recent state-of-the-art
data curation method, on several key factors. It is shown that GSTDS achieves
notable reductions in computational requirements, up to four times, without
compromising performance. GSTDS exhibits a considerable growth in terms of
accuracy under the limited computational resource usage, in contrast to other
methodologies. These promising results underscore the potential of
spectral-based data selection as a scalable solution for resource-efficient
deep learning and motivate further exploration into adaptive data curation
strategies. You can find the code at https://github.com/rezasharifi82/GSTDS.