A reinforcement learning algorithm acquires demonstration from the training agent by dividing the task space.

Journal: Neural networks : the official journal of the International Neural Network Society

Published Date: May 5, 2023

Abstract

Although reinforcement learning (RL) has made numerous breakthroughs in recent years, addressing reward-sparse environments remains challenging and requires further exploration. Many studies improve the performance of the agents by introducing the state-action pairs experienced by an expert. However, such kinds of strategies almost depend on the quality of the demonstration by the expert, which is rarely optimal in a real-world environment, and struggle with learning from sub-optimal demonstrations. In this paper, a self-imitation learning algorithm based on the task space division is proposed to realize an efficient high-quality demonstration acquire while the training process. To determine the quality of the trajectory, some well-designed criteria are defined in the task space for finding a better demonstration. The results show that the proposed algorithm will improve the success rate of robot control and achieve a high mean Q value per step. The algorithm framework proposed in this paper has illustrated a great potential to learn from a demonstration generated by using self-policy in sparse environments and can be used in reward-sparse environments where the task space can be divided.

Authors

Lipeng Zu

State Key Laboratory of Robotics, Shenyang Institute of Automation, Chinese Academy of Sciences, Shenyang, 110016, China; Institutes for Robotics and Intelligent Manufacturing, Chinese Academy of Sciences, Shenyang, 110169, China; University of Chinese Academy of Sciences, Beijing, 100049, China.
Xiao He

Department of Biosystems Science and Engineering, ETH Zurich, Basel, Switzerland. xiao.he@bsse.ethz.ch.
Jia Yang

Operating Room, Cheng'an County People's Hospital, Handan, Hebei, China.
Lianqing Liu
Wenxue Wang

Keywords

Algorithms Artificial Intelligence Reinforcement, Psychology Reward

External Resources

View on PubMed Access via DOI PubMed (37187108)

A reinforcement learning algorithm acquires demonstration from the training agent by dividing the task space.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals