Streaming Krylov-Accelerated Stochastic Gradient Descent
Journal:
arXiv
Published Date:
May 11, 2025
Abstract
We present SKA-SGD (Streaming Krylov-Accelerated Stochastic Gradient
Descent), a novel optimization approach that accelerates convergence for
ill-conditioned problems by projecting stochastic gradients onto a
low-dimensional Krylov subspace. Directly inspired by recent advances in s-step
Conjugate Gradient methods with streaming Gauss-Seidel Gram solvers
\cite{dambra2025sstep}, our method extends these techniques to the stochastic
optimization domain. Our approach combines three key innovations: (1)
projection coefficients computed via a single streaming Gauss-Seidel iteration,
which is mathematically equivalent to Modified Gram-Schmidt orthogonalization;
(2) a Chebyshev polynomial basis for constructing the Krylov subspace,
providing superior numerical stability; and (3) efficient implementation for
AMD GPUs using HIP. We prove that our streaming approach achieves a backward
error near machine precision with $O(s^2)$ complexity rather than $O(s^3)$,
where $s$ is the Krylov subspace dimension. Experimental results demonstrate
that SKA-SGD significantly outperforms standard SGD and Adam in convergence
rate and final error, particularly for problems with condition numbers
exceeding $10^3$. GPU performance analysis reveals a crossover point where
communication-avoiding benefits outweigh computational overhead, typically
occurring at moderate scale ($p \approx 64$ processors) for problem sizes $n
\geq 10^6$.