Unifying and revisiting Sharpness-Aware Minimization with noise-injected micro-batch scheduler for efficiency improvement.

Journal: Neural networks : the official journal of the International Neural Network Society

Published Date: Feb 3, 2025

Abstract

Sharpness-aware minimization (SAM) has been proposed to improve generalization by encouraging the model to converge to a flatter region. However, SAM's two sequential gradient computations lead to 2× computation overhead compared to the base optimizer (e.g., SGD). Recent works improve SAM's efficiency either by switching between SAM and base optimizer or by reducing data samples. In this paper, we first propose the micro-batch scheduler to unify the above two ideas and summarize that the commonality of them is adopting a smaller micro-batch to approximate the perturbation. However, its role is not fully explored. Thus, we revisit the effect of micro-batch approximated perturbation on accuracy and efficiency and empirically observe that a too-small micro-batch causes accuracy degradation as it leads to a sharper loss landscape. To alleviate it, we inject random noise into the micro-batch approximated gradient in SAM's first ascent step, which implicitly leverages random perturbation before SAM's second descent step. The visualization results confirm that it encourages the model to converge to a flatter region. Extensive experiments with various models (e.g., ResNet-18/50, WideResNet-28-10, PyramidNet-110, and ViT-B/16, etc.) evaluated on CIFAR-10 and ImageNet-1K show that the proposed method achieves competitive accuracy with higher efficiency when compared to several efficient SAM variants (e.g., ESAM, LooKSAM-5, AE-SAM, K-SAM, etc.).

Authors

Zheng Wei

Ministry of Education Key Laboratory of Bioinformatics, Center for Synthetic and Systems Biology, Beijing National Research Center for Information Science and Technology, Department of Automation, Tsinghua University, Beijing 100084, China.
Xingjun Zhang

School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China. Electronic address: xjzhang@xjtu.edu.cn.
Zhendong Tan

School of Computer Science and Technology, Xi'an Jiaotong University, Xi'an 710049, China.

Keywords

Algorithms Computer Simulation Neural Networks, Computer

External Resources

View on PubMed Access via DOI PubMed (39922159)

Unifying and revisiting Sharpness-Aware Minimization with noise-injected micro-batch scheduler for efficiency improvement.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals