Decentralized stochastic sharpness-aware minimization algorithm.

Journal: Neural networks : the official journal of the International Neural Network Society
PMID:

Abstract

In recent years, distributed stochastic algorithms have become increasingly useful in the field of machine learning. However, similar to traditional stochastic algorithms, they face a challenge where achieving high fitness on the training set does not necessarily result in good performance on the test set. To address this issue, we propose to use of a distributed network topology to improve the generalization ability of the algorithms. We specifically focus on the Sharpness-Aware Minimization (SAM) algorithm, which relies on perturbation weights to find the maximum point with better generalization ability. In this paper, we present the decentralized stochastic sharpness-aware minimization (D-SSAM) algorithm, which incorporates the distributed network topology. We also provide sublinear convergence results for non-convex targets, which is comparable to consequence of Decentralized Stochastic Gradient Descent (DSGD). Finally, we empirically demonstrate the effectiveness of these results in deep networks and discuss their relationship to the generalization behavior of SAM.

Authors

  • Simiao Chen
    Heidelberg Institute of Global Health, Heidelberg, Germany.
  • Xiaoge Deng
    College of Computer, National University of Defense Technology, Changsha, Hunan, China.
  • Dongpo Xu
    School of Mathematics and Statistics, Northeast Normal University, Changchun 130024, China; College of Science, Harbin Engineering University, Harbin 150001, China. Electronic address: dongpoxu@gmail.com.
  • Tao Sun
    Janssen Research & Development, LLC, Raritan, NJ, USA.
  • Dongsheng Li
    Institute of Biomedical Engineering, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China.