Self-Referencing Agents for Unsupervised Reinforcement Learning.

Journal: Neural networks : the official journal of the International Neural Network Society
Published Date:

Abstract

Current unsupervised reinforcement learning methods often overlook reward nonstationarity during pre-training and the forgetting of exploratory behavior during fine-tuning. Our study introduces Self-Reference (SR), a novel add-on module designed to address both issues. SR stabilizes intrinsic rewards through historical referencing in pre-training, mitigating nonstationarity. During fine-tuning, it preserves exploratory behaviors, retaining valuable skills. Our approach significantly boosts the performance and sample efficiency of existing URL model-free methods on the Unsupervised Reinforcement Learning Benchmark, improving IQM by up to 17% and reducing the Optimality Gap by 31%. This highlights the general applicability and compatibility of our add-on module with existing methods.

Authors

  • Andrew Zhao
    Department of Automation, BNRist, Tsinghua University, China. Electronic address: andrewzhao112@gmail.com.
  • Erle Zhu
    Department of Computer Science, BNRist, Tsinghua University, China.
  • Rui Lu
    Department of Health Statistics, College of Public Health, Tianjin Medical University, Heping District, Tianjin, P.R. China.
  • Matthieu Lin
    Department of Computer Science, BNRist, Tsinghua University, China.
  • Yong-Jin Liu
    Department of Computer Science and Technology, Tsinghua University, Beijing, China.
  • Gao Huang
    Department of Automation, Tsinghua University, Beijing 100084, China. huang-g09@mails.tsinghua.edu.cn