Elastic Multi-Gradient Descent for Parallel Continual Learning.

Journal: IEEE transactions on pattern analysis and machine intelligence
Published Date:

Abstract

Learning a single shared model across multiple tasks is a long-standing goal in machine learning, traditionally addressed through Multi-Task Learning (MTL) or Serial Continual Learning (SCL). However, MTL assumes fixed task availability, limiting adaptability to new tasks, while SCL suffers from delayed response due to its strictly sequential learning protocol. This paper studies the novel paradigm of Parallel Continual Learning (PCL) in dynamic multi-task scenarios, where a diverse set of tasks is encountered at different time points. PCL inherits the challenges of both MTL and SCL, namely, task conflict and catastrophic forgetting, compounded by the dynamic and asynchronous arrival of tasks. To address these challenges, we propose Elastic Multi-Gradient Descent (EMGD), which formulates PCL as a dynamic multi-objective optimization problem. EMGD introduces task-specific elastic factors to guide gradient updates toward Pareto-optimal directions, ensuring balanced learning across tasks. Additionally, we develop a gradient-guided memory editing mechanism that aligns rehearsal data with the optimized descent direction, mitigating memory-induced interference. Theoretical analysis shows that accumulation points of the EMGD update rule are Pareto critical under the proposed formulation, and extensive experiments on image classification benchmarks demonstrate that EMGD significantly outperforms existing methods, including state-of-the-art PCL, MTL, and SCL approaches.

Authors

Keywords

No keywords available for this article.