Incremental model-based reinforcement learning with model constraint.

Journal: Neural networks : the official journal of the International Neural Network Society
PMID:

Abstract

In model-based reinforcement learning (RL) approaches, the estimated model of a real environment is learned with limited data and then utilized for policy optimization. As a result, the policy optimization process in model-based RL is influenced by both policy and estimated model updates. In practice, previous model-based RL methods only perform incremental policy constraint to policy updates, which cannot assure the complete incremental updates, thereby limiting the algorithm's performance. To address this issue, we propose an incremental model-based RL update scheme by analyzing the policy optimization procedure of model-based RL. This scheme includes both an incremental model constraint that guarantees incremental updates to the estimated model, and an incremental policy constraint that ensures incremental updates to the policy. Further, we establish a performance bound incorporating the incremental model-based RL update scheme between the real environment and the estimated model, which can assure non-decreasing policy performance improvement in the real environment. To implement the incremental model-based RL update scheme, we develop a simple and efficient model-based RL algorithm known as IMPO (Incremental Model-based Policy Optimization), which leverages previous knowledge to enhance stability during the learning process. Experimental results across various control benchmarks demonstrate that IMPO significantly outperforms previous state-of-the-art model-based RL methods in terms of overall performance and sample efficiency.

Authors

  • Zhiyou Yang
    School of Computer Science and Engineering, University of Electronic Science and Technology of China, No. 2006 Xiyuan Ave, Chengdu, 611731, Sichuan, China.
  • Mingsheng Fu
    School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, Sichuan, China. Electronic address: fms@uestc.edu.cn.
  • Hong Qu
    Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing 100871, China.
  • Fan Li
    Department of Instrument Science and Engineering, School of SEIEE, Shanghai Jiao Tong University, Shanghai 200240, China.
  • Shuqing Shi
    Cardiovascular Department, Guang'anmen Hospital, China Academy of Chinese Medical Sciences.
  • Wang Hu
    School of Computer Science and Engineering, University of Electronic Science and Technology of China, No. 2006 Xiyuan Ave, Chengdu, 611731, Sichuan, China.