SGLP: A Similarity Guided Fast Layer Partition Pruning for Compressing Large Deep Models
Journal:
arXiv
Published Date:
Oct 14, 2024
Abstract
The deployment of Deep Neural Network (DNN)-based networks on
resource-constrained devices remains a significant challenge due to their high
computational and parameter requirements. To solve this problem, layer pruning
has emerged as a potent approach to reduce network size and improve
computational efficiency. However, existing layer pruning methods mostly
overlook the intrinsic connections and inter-dependencies between different
layers within complicated deep neural networks. This oversight can result in
pruned models that do not preserve the essential characteristics of the
pre-trained network as effectively as desired. To address this limitations, we
propose a Similarity Guided fast Layer Partition pruning for compressing large
deep models (SGLP), which focuses on pruning layers from network segments
partitioned via representation similarity. Specifically, our presented method
first leverages Centered Kernel Alignment (CKA) to indicate the internal
representations among the layers of the pre-trained network, which provides us
with a potent basis for layer pruning. Based on similarity matrix derived from
CKA, we employ Fisher Optimal Segmentation to partition the network into
multiple segments, which provides a basis for removing the layers in a
segment-wise manner. In addition, our method innovatively adopts GradNorm for
segment-wise layer importance evaluation, eliminating the need for extensive
fine-tuning, and finally prunes the unimportant layers to obtain a compact
network. Experimental results in image classification and for large language
models (LLMs) demonstrate that our proposed SGLP outperforms the
state-of-the-art methods in both accuracy and computational efficiency,
presenting a more effective solution for deploying DNNs on resource-limited
platforms. Our codes are available at
https://github.com/itsnotacie/information-fusion-SGLP.