Enhance Exploration in Safe Reinforcement Learning with Contrastive Representation Learning
Journal:
arXiv
Published Date:
Mar 13, 2025
Abstract
In safe reinforcement learning, agent needs to balance between exploration
actions and safety constraints. Following this paradigm, domain transfer
approaches learn a prior Q-function from the related environments to prevent
unsafe actions. However, because of the large number of false positives, some
safe actions are never executed, leading to inadequate exploration in
sparse-reward environments. In this work, we aim to learn an efficient state
representation to balance the exploration and safety-prefer action in a
sparse-reward environment. Firstly, the image input is mapped to latent
representation by an auto-encoder. A further contrastive learning objective is
employed to distinguish safe and unsafe states. In the learning phase, the
latent distance is used to construct an additional safety check, which allows
the agent to bias the exploration if it visits an unsafe state. To verify the
effectiveness of our method, the experiment is carried out in three
navigation-based MiniGrid environments. The result highlights that our method
can explore the environment better while maintaining a good balance between
safety and efficiency.