A reinforcement learning algorithm acquires demonstration from the training agent by dividing the task space.
Journal:
Neural networks : the official journal of the International Neural Network Society
Published Date:
May 5, 2023
Abstract
Although reinforcement learning (RL) has made numerous breakthroughs in recent years, addressing reward-sparse environments remains challenging and requires further exploration. Many studies improve the performance of the agents by introducing the state-action pairs experienced by an expert. However, such kinds of strategies almost depend on the quality of the demonstration by the expert, which is rarely optimal in a real-world environment, and struggle with learning from sub-optimal demonstrations. In this paper, a self-imitation learning algorithm based on the task space division is proposed to realize an efficient high-quality demonstration acquire while the training process. To determine the quality of the trajectory, some well-designed criteria are defined in the task space for finding a better demonstration. The results show that the proposed algorithm will improve the success rate of robot control and achieve a high mean Q value per step. The algorithm framework proposed in this paper has illustrated a great potential to learn from a demonstration generated by using self-policy in sparse environments and can be used in reward-sparse environments where the task space can be divided.