Deep reinforcement learning can promote sustainable human behaviour in a common-pool resource problem.

Journal: Nature communications
PMID:

Abstract

A canonical social dilemma arises when resources are allocated to people, who can either reciprocate with interest or keep the proceeds. The right resource allocation mechanisms can encourage levels of reciprocation that sustain the commons. Here, in an iterated multiplayer trust game, we use deep reinforcement learning (RL) to design a social planner that promotes sustainable contributions from human participants. We first trained neural networks to behave like human players, creating a stimulated economy that allows us to study the dynamics of receipt and reciprocation. We use RL to train a mechanism to maximise aggregate return to players. The RL mechanism discovers a redistributive policy that leads to a large but also more equal surplus. The mechanism outperforms baseline mechanisms by conditioning its generosity on available resources and temporarily sanctioning defectors. Examining the RL policy allows us to develop a similar but explainable mechanism that is more popular among players.

Authors

  • Raphael Koster
    Deepmind, London, UK.
  • Miruna PĂ®slar
    Google DeepMind, London, UK. mirunapislar@google.com.
  • Andrea Tacchetti
    Deepmind, London, UK.
  • Jan Balaguer
    Deepmind, London, UK.
  • Leqi Liu
    Google DeepMind, London, UK.
  • Romuald Elie
    DeepMind Technologies Ltd., London, UK.
  • Oliver P Hauser
    Department of Economics, University of Exeter, Exeter, UK.
  • Karl Tuyls
    DeepMind Technologies Ltd., London, UK.
  • Matt Botvinick
    DeepMind, 5 New Street Square, London EC4A 3TW, UK.
  • Christopher Summerfield
    DeepMind, 5 New Street Square, London, UK; Department of Experimental Psychology, University of Oxford, Oxford, UK.