A reinforcement learning algorithm acquires demonstration from the training agent by dividing the task space.

Journal: Neural networks : the official journal of the International Neural Network Society
Published Date:

Abstract

Although reinforcement learning (RL) has made numerous breakthroughs in recent years, addressing reward-sparse environments remains challenging and requires further exploration. Many studies improve the performance of the agents by introducing the state-action pairs experienced by an expert. However, such kinds of strategies almost depend on the quality of the demonstration by the expert, which is rarely optimal in a real-world environment, and struggle with learning from sub-optimal demonstrations. In this paper, a self-imitation learning algorithm based on the task space division is proposed to realize an efficient high-quality demonstration acquire while the training process. To determine the quality of the trajectory, some well-designed criteria are defined in the task space for finding a better demonstration. The results show that the proposed algorithm will improve the success rate of robot control and achieve a high mean Q value per step. The algorithm framework proposed in this paper has illustrated a great potential to learn from a demonstration generated by using self-policy in sparse environments and can be used in reward-sparse environments where the task space can be divided.

Authors

  • Lipeng Zu
    State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, 110016, China; Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang, 110169, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
  • Xiao He
    Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland. xiao.he@bsse.ethz.ch.
  • Jia Yang
    Operating Room, Cheng'an County People's Hospital, Handan, Hebei, China.
  • Lianqing Liu
  • Wenxue Wang