Dopamine drives a positive reward bias on human reinforcement learning

Journal: bioRxiv
Published Date:

Abstract

Formal theories of reinforcement learning (RL) prescribe a clearly defined function for dopamine, namely modulating learning via reward prediction errors (RPEs). Yet, empirical evidence in humans remains scarce, and recent advances introducing noisy RL cast doubt on a simple one-to-one mapping between neurotransmitters and computational mechanisms. Here, we detail a double-blind, placebo-controlled, randomised pharmacological study using the dopamine precursor L-DOPA, while healthy volunteers performed a volatile two-armed bandit task. Behaviourally, L-DOPA decreased switching behaviour following below-average rewards. Algorithmic RL modelling of human behaviour supported a dual effect of L-DOPA on the rate and precision of learning. By leveraging recurrent neural networks (RNNs) as implementational models of RL, we explain this dual effect through a single inference-time modulation, whereby L-DOPA triggers a positive reward bias at the input of the recurrent layer that implements RL. Our findings highlight a unifying mechanism at the implementation level that explain seemingly disparate algorithmic effects of dopamine.

Authors

  • Arnaud Zalta; Vasilisa Skvortsova; Samuel R. Hewitt; Michael Moutoussis; Matthew M. Nour; Raymond J. Dolan; Charles Findling; Tobias U. Hauser; Valentin Wyart