Efficient learning rate adaptation based on hierarchical optimization approach.

Journal: Neural networks : the official journal of the International Neural Network Society
Published Date:

Abstract

This paper proposes a new hierarchical approach to learning rate adaptation in gradient methods, called learning rate optimization (LRO). LRO formulates the learning rate adaption problem as a hierarchical optimization problem that minimizes the loss function with respect to the learning rate for current model parameters and gradients. Then, LRO optimizes the learning rate based on the alternating direction method of multipliers (ADMM). In the process of this learning rate optimization, LRO does not require any second-order information and probabilistic model, so it is highly efficient. Furthermore, LRO does not require any additional hyperparameters when compared to the vanilla gradient method with the simple exponential learning rate decay. In the experiments, we integrated LRO with vanilla SGD and Adam. Then, we compared their optimization performance with the state-of-the-art learning rate adaptation methods and also the most commonly-used adaptive gradient methods. The SGD and Adam with LRO outperformed all the competitors on the benchmark datasets in image classification tasks.

Authors

  • Gyoung S Na
    Chemical Data-Driven Research Center, Korea Research Institute of Chemical Technology (KRICT), Daejeon 34114, Korea.