Adaptive bias-variance trade-off in advantage estimator for actor-critic algorithms.

Journal: Neural networks : the official journal of the International Neural Network Society

Published Date: Oct 31, 2023

Abstract

Actor-critic methods are leading in many challenging continuous control tasks. Advantage estimators, the most common critics in the actor-critic framework, combine state values from bootstrapping value functions and sample returns. Different combinations balance the bias introduced by state values and the variance returned by samples to reduce estimation errors. The bias and variance constantly fluctuate throughout training, leading to different optimal combinations. However, existing advantage estimators usually use fixed combinations that fail to account for the trade-off between minimizing bias and variance to find the optimal estimate. Our previous work on adaptive advantage estimation (AAE) analyzed the sources of bias and variance and offered two indicators. This paper further explores the relationship between the indicators and their optimal combination through typical numerical experiments. These analyses develop a general form of adaptive combinations of state values and sample returns to achieve low estimation errors. Empirical results on simulated robotic locomotion tasks show that our proposed estimators achieve similar or superior performance compared to previous generalized advantage estimators (GAE).

Authors

Yurou Chen

The State Key Lab of Management and Control for Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing, China; School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China. Electronic address: chenyurou2019@ia.ac.cn.
Fengyi Zhang
Zhiyong Liu

State Key Laboratory of Respiratory Disease , Guangzhou Institutes of Biomedicine and Health (GIBH) , Chinese Academy of Sciences (CAS) , Guangzhou-510530 , China . Email: zhang_tianyu@gibh.ac.cn ; ; Tel: (+86)20 3201 5270.

Keywords

Algorithms Bias Robotics

External Resources

View on PubMed Access via DOI PubMed (37981458)

Adaptive bias-variance trade-off in advantage estimator for actor-critic algorithms.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals