Quantized Convolutional Neural Networks Robustness under Perturbation.
Journal:
F1000Research
Published Date:
Apr 9, 2025
Abstract
Contemporary machine learning models are increasingly becoming restricted by size and subsequent operations per forward pass, demanding increasing compute requirements. Quantization has emerged as a convenient approach to addressing this, in which weights and activations are mapped from their conventionally used floating-point 32-bit numeric representations to lower precision integers. This process introduces significant reductions in inference time and simplifies the hardware requirements. It is a well-studied result that the performance of such reduced precision models is congruent with their floating-point counterparts. However, there is a lack of literature that addresses the performance of quantized models in a perturbed input space, as is common when stress testing regular full-precision models, particularly for real-world deployments. We focus on addressing this gap in the context of 8-bit quantized convolutional neural networks (CNNs). We study three state-of-the-art CNNs: ResNet-18, VGG-16, and SqueezeNet1_1, and subject their floating point and fixed point forms to various noise regimes with varying intensities. We characterize performance in terms of traditional metrics, including top-1 and top-5 accuracy, as well as the F1 score. We also introduce a new metric, the Kullback-Liebler divergence of the two output distributions for a given floating-point/fixed-point model pair, as a means to examine how the model's output distribution has changed as a result of quantization, which, we contend, can be interpreted as a proxy for model similarity in decision making. We find that across all three models and under each perturbation scheme, the relative error between the quantized and full-precision model was consistently low. We also find that Kullback-Liebler divergence was on the same order of magnitude as the unperturbed tests across all perturbation regimes except Brownian noise, where significant divergences were observed for VGG-16 and SqueezeNet1_1.