Arch-Net: Model conversion and quantization for architecture agnostic model deployment.

Journal: Neural networks : the official journal of the International Neural Network Society

Published Date: Mar 18, 2025

Abstract

The significant computational demands of Deep Neural Networks (DNNs) present a major challenge for their practical application. Recently, many Application-Specific Integrated Circuit (ASIC) chips have incorporated dedicated hardware support for neural network acceleration. However, the lengthy development cycle of ASIC chips means they often lag behind the latest advances in neural architecture research. For instance, Layer Normalization is not well-supported on many popular chips, and the efficiency of 7 × 7 convolution is significantly lower than the equivalent three 3 × 3 convolution. Therefore, in this paper, we introduce Arch-Net, a neural network framework comprised exclusively of a select few common operators, namely 3 × 3 Convolution, 2 × 2 Max-pooling, Batch Normalization, Fully Connected layers, and Concatenation, which are efficiently supported across the majority of ASIC architectures. To facilitate the conversion of disparate network architectures into Arch-Net, we propose the Arch-Distillation methodology, which incorporates strategies such as Residual Feature Adaptation and Teacher Attention Mechanism. These mechanisms enable effective conversion between different network structures alongside efficient model quantization. The resultant Arch-Net eliminates unconventional network constructs while maintaining robust performance even under sub-8-bit quantization, thereby enhancing compatibility and deployment efficiency. Empirical results from image classification and machine translation tasks demonstrate that using only a few types of operators in Arch-Net can achieve results comparable to those obtained with complex architectures. This provides a new insight for deploying structure-agnostic neural networks on various ASIC chips.

Authors

Shuangkang Fang

School of Electrical and Information Engineering, Beihang University, Beijing, 100191, China. Electronic address: skfang@buaa.edu.cn.
Weixin Xu

Megvii Research, Megvii Inc., Bejing, 100096, China. Electronic address: xuweixin02@megvii.com.
Zipeng Feng

Megvii Research, Megvii Inc., Bejing, 100096, China. Electronic address: fengzipeng@megvii.com.
Song Yuan

Megvii Research, Megvii Inc., Bejing, 100096, China. Electronic address: yuansong@megvii.com.
Yufeng Wang

People's Hospital of Gaoxin, 768 Fudong Road, Weifang 261205, China.
Yi Yang

Department of Orthopedics, Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, Sichuan, China.
Wenrui Ding

Institute of Unmanned System, Beihang University, Beijing, 100191, China. Electronic address: ding@buaa.edu.cn.
Shuchang Zhou

Megvii Research, Megvii Inc., Bejing, 100096, China. Electronic address: zsc@megvii.com.

Keywords

Algorithms Humans Neural Networks, Computer

External Resources

View on PubMed Access via DOI PubMed (40120552)

Arch-Net: Model conversion and quantization for architecture agnostic model deployment.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals