Deterministic dynamics of distributional multi-agent reinforcement learning

Journal: bioRxiv
Published Date:

Abstract

Understanding how cognition shapes behavior across contexts remains a fundamental challenge for many disciplines. In particular, for the optimism heuristic–i.e., the tendency to overweight positive (relative to negative) information–knowledge remains fragmented, with models developed in specific domains in isolation. Here, we present a unifying computational framework by deriving the deterministic dynamics of distributional multi-agent reinforcement learning. Our approach discretizes reward distributions through a finite set of neurons, consistent with recent empirical findings on distributional coding in the brain. We validate our framework by reproducing established results across three iconic domains spanning individual exploration versus exploitation, social coordination, and risky choice. Beyond validation, we uncover novel interactions among optimism, reward discretization, and temporal discounting. Specifically, we identify conditions under which choice hysteresis and path-dependent strategies emerge, suggesting that perseveration results from neural reward discretization rather than constituting an independent heuristic. We further reveal “individual dilemmas”: circumstances where agents gravitate toward suboptimal yet stable strategies, offering a mechanistic explanation for incoherent choice patterns. Our framework bridges neuroscience, psychology, and collective behavior, enabling empirically testable hypotheses about how cognitive biases propagate from individual cognition to social outcomes in complex environments.

Authors

  • Clémence Bergerot; Pawel Romanczuk; Wolfram Barfuss