End-to-End Hierarchical Reinforcement Learning With Integrated Subgoal Discovery.

Journal: IEEE transactions on neural networks and learning systems
Published Date:

Abstract

Hierarchical reinforcement learning (HRL) is a promising approach to perform long-horizon goal-reaching tasks by decomposing the goals into subgoals. In a holistic HRL paradigm, an agent must autonomously discover such subgoals and also learn a hierarchy of policies that uses them to reach the goals. Recently introduced end-to-end HRL methods accomplish this by using the higher-level policy in the hierarchy to directly search the useful subgoals in a continuous subgoal space. However, learning such a policy may be challenging when the subgoal space is large. We propose integrated discovery of salient subgoals (LIDOSS), an end-to-end HRL method with an integrated subgoal discovery heuristic that reduces the search space of the higher-level policy, by explicitly focusing on the subgoals that have a greater probability of occurrence on various state-transition trajectories leading to the goal. We evaluate LIDOSS on a set of continuous control tasks in the MuJoCo domain against hierarchical actor critic (HAC), a state-of-the-art end-to-end HRL method. The results show that LIDOSS attains better goal achievement rates than HAC in most of the tasks.

Authors

  • Shubham Pateria
  • Budhitama Subagdja
    ST Engineering-NTU Corporate Laboratory, Nanyang Technological University, Singapore. Electronic address: budhitama@ntu.edu.sg.
  • Ah-Hwee Tan
  • Chai Quek