How Weight Resampling and Optimizers Shape the Dynamics of Continual Learning and Forgetting in Neural Networks
Journal:
arXiv
Published Date:
Jul 2, 2025
Abstract
Recent work in continual learning has highlighted the beneficial effect of
resampling weights in the last layer of a neural network (``zapping"). Although
empirical results demonstrate the effectiveness of this approach, the
underlying mechanisms that drive these improvements remain unclear. In this
work, we investigate in detail the pattern of learning and forgetting that take
place inside a convolutional neural network when trained in challenging
settings such as continual learning and few-shot transfer learning, with
handwritten characters and natural images. Our experiments show that models
that have undergone zapping during training more quickly recover from the shock
of transferring to a new domain. Furthermore, to better observe the effect of
continual learning in a multi-task setting we measure how each individual task
is affected. This shows that, not only zapping, but the choice of optimizer can
also deeply affect the dynamics of learning and forgetting, causing complex
patterns of synergy/interference between tasks to emerge when the model learns
sequentially at transfer time.