Arch-Net: Model conversion and quantization for architecture agnostic model deployment.

Journal: Neural networks : the official journal of the International Neural Network Society
Published Date:

Abstract

The significant computational demands of Deep Neural Networks (DNNs) present a major challenge for their practical application. Recently, many Application-Specific Integrated Circuit (ASIC) chips have incorporated dedicated hardware support for neural network acceleration. However, the lengthy development cycle of ASIC chips means they often lag behind the latest advances in neural architecture research. For instance, Layer Normalization is not well-supported on many popular chips, and the efficiency of 7 × 7 convolution is significantly lower than the equivalent three 3 × 3 convolution. Therefore, in this paper, we introduce Arch-Net, a neural network framework comprised exclusively of a select few common operators, namely 3 × 3 Convolution, 2 × 2 Max-pooling, Batch Normalization, Fully Connected layers, and Concatenation, which are efficiently supported across the majority of ASIC architectures. To facilitate the conversion of disparate network architectures into Arch-Net, we propose the Arch-Distillation methodology, which incorporates strategies such as Residual Feature Adaptation and Teacher Attention Mechanism. These mechanisms enable effective conversion between different network structures alongside efficient model quantization. The resultant Arch-Net eliminates unconventional network constructs while maintaining robust performance even under sub-8-bit quantization, thereby enhancing compatibility and deployment efficiency. Empirical results from image classification and machine translation tasks demonstrate that using only a few types of operators in Arch-Net can achieve results comparable to those obtained with complex architectures. This provides a new insight for deploying structure-agnostic neural networks on various ASIC chips.

Authors

  • Shuangkang Fang
    School of Electrical and Information Engineering, Beihang University, Beijing, 100191, China. Electronic address: skfang@buaa.edu.cn.
  • Weixin Xu
    Megvii Research, Megvii Inc., Bejing, 100096, China. Electronic address: xuweixin02@megvii.com.
  • Zipeng Feng
    Megvii Research, Megvii Inc., Bejing, 100096, China. Electronic address: fengzipeng@megvii.com.
  • Song Yuan
    Megvii Research, Megvii Inc., Bejing, 100096, China. Electronic address: yuansong@megvii.com.
  • Yufeng Wang
    People's Hospital of Gaoxin, 768 Fudong Road, Weifang 261205, China.
  • Yi Yang
    Department of Orthopedics, Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, Sichuan, China.
  • Wenrui Ding
    Institute of Unmanned System, Beihang University, Beijing, 100191, China. Electronic address: ding@buaa.edu.cn.
  • Shuchang Zhou
    Megvii Research, Megvii Inc., Bejing, 100096, China. Electronic address: zsc@megvii.com.