Relating Human Error-Based Learning to Modern Deep RL Algorithms.

Journal: Neural computation
PMID:

Abstract

In human error-based learning, the size and direction of a scalar error (i.e., the "directed error") are used to update future actions. Modern deep reinforcement learning (RL) methods perform a similar operation but in terms of scalar rewards. Despite this similarity, the relationship between action updates of deep RL and human error-based learning has yet to be investigated. Here, we systematically compare the three major families of deep RL algorithms to human error-based learning. We show that all three deep RL approaches are qualitatively different from human error-based learning, as assessed by a mirror-reversal perturbation experiment. To bridge this gap, we developed an alternative deep RL algorithm inspired by human error-based learning, model-based deterministic policy gradients (MB-DPG). We showed that MB-DPG captures human error-based learning under mirror-reversal and rotational perturbations and that MB-DPG learns faster than canonical model-free algorithms on complex arm-based reaching tasks, while being more robust to (forward) model misspecification than model-based RL.

Authors

  • Michele Garibbo
    Department of Engineering Mathematics, Faculty of Engineering, University of Bristol, Bristol BS8 1QU, U.K. michele.garibbo@bristol.ac.uk.
  • Casimir J H Ludwig
    School of Psychological Science, University of Bristol, Bristol, UK.
  • Nathan F Lepora
  • Laurence Aitchison
    Computational and Biological Learning Lab, Department of Engineering, University of Cambridge, Cambridge, UK.