Arch-Net: Model conversion and quantization for architecture agnostic model deployment.
Journal:
Neural networks : the official journal of the International Neural Network Society
Published Date:
Mar 18, 2025
Abstract
The significant computational demands of Deep Neural Networks (DNNs) present a major challenge for their practical application. Recently, many Application-Specific Integrated Circuit (ASIC) chips have incorporated dedicated hardware support for neural network acceleration. However, the lengthy development cycle of ASIC chips means they often lag behind the latest advances in neural architecture research. For instance, Layer Normalization is not well-supported on many popular chips, and the efficiency of 7 × 7 convolution is significantly lower than the equivalent three 3 × 3 convolution. Therefore, in this paper, we introduce Arch-Net, a neural network framework comprised exclusively of a select few common operators, namely 3 × 3 Convolution, 2 × 2 Max-pooling, Batch Normalization, Fully Connected layers, and Concatenation, which are efficiently supported across the majority of ASIC architectures. To facilitate the conversion of disparate network architectures into Arch-Net, we propose the Arch-Distillation methodology, which incorporates strategies such as Residual Feature Adaptation and Teacher Attention Mechanism. These mechanisms enable effective conversion between different network structures alongside efficient model quantization. The resultant Arch-Net eliminates unconventional network constructs while maintaining robust performance even under sub-8-bit quantization, thereby enhancing compatibility and deployment efficiency. Empirical results from image classification and machine translation tasks demonstrate that using only a few types of operators in Arch-Net can achieve results comparable to those obtained with complex architectures. This provides a new insight for deploying structure-agnostic neural networks on various ASIC chips.