An Efficient Training Algorithm for Models with Block-wise Sparsity
Journal:
arXiv
Published Date:
Mar 27, 2025
Abstract
Large-scale machine learning (ML) models are increasingly being used in
critical domains like education, lending, recruitment, healthcare, criminal
justice, etc. However, the training, deployment, and utilization of these
models demand substantial computational resources. To decrease computation and
memory costs, machine learning models with sparse weight matrices are widely
used in the literature. Among sparse models, those with special sparse
structures (e.g., models with block-wise sparse weight matrices) fit better
with the hardware accelerators and can decrease the memory and computation
costs during the inference. Unfortunately, while there are several efficient
training methods, none of them are designed to train a block-wise sparse model
efficiently. As a result, the current methods for training block-wise sparse
models start with full and dense models leading to inefficient training. In
this work, we focus on training models with \textit{block-wise sparse matrices}
and propose an efficient training algorithm to decrease both computation and
memory costs during training and inference. In addition, we will show that our
proposed method enables us to efficiently find the right block size for the
sparsity pattern during the training process. Our extensive empirical and
theoretical analyses show that our algorithms can decrease the computation and
memory costs significantly without a performance drop compared to baselines.