A fully value distributional deep reinforcement learning framework for multi-agent cooperation.

Journal: Neural networks : the official journal of the International Neural Network Society
PMID:

Abstract

Distributional Reinforcement Learning (RL) extends beyond estimating the expected value of future returns by modeling its entire distribution, offering greater expressiveness and capturing deeper insights of the value function. To leverage this advantage, distributional multi-agent systems based on value-decomposition techniques were proposed recently. Ideally, a distributional multi-agent system should be fully distributional, which means both the individual and global value functions should be constructed in distributional forms. However, recent studies show that directly applying traditional value-decomposition techniques to this fully distributional form cannot guarantee the satisfaction of the necessary individual-global-max (IGM) principle. To address this problem, we propose a novel fully value distributional multi-agent framework based on value-decomposition and prove that the IGM principle can be guaranteed under our framework. Based on this framework, a practical deep reinforcement learning model called Fully Distributional Multi-Agent Cooperation (FDMAC) is proposed, and the effectiveness of FDMAC is verified under different scenarios of the StarCraft Multi-Agent Challenge micromanagement environment. Further experimental results show that our FDMAC model can outperform the best baseline by 10.47% on average in terms of the median test win rate.

Authors

  • Mingsheng Fu
    School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, Sichuan, China. Electronic address: fms@uestc.edu.cn.
  • Liwei Huang
    School of Computer Science and Engineering, University of Electronic Science and Technology of China, Chengdu, 611731, Sichuan, China. Electronic address: liweihuang@uestc.edu.cn.
  • Fan Li
    Department of Instrument Science and Engineering, School of SEIEE, Shanghai Jiao Tong University, Shanghai 200240, China.
  • Hong Qu
    Center for Bioinformatics, State Key Laboratory of Protein and Plant Gene Research, College of Life Sciences, Peking University, Beijing 100871, China.
  • Chengzhong Xu
    State Key Laboratory of IoTSC, University of Macau, Taipa, 999078, Macao Special Administrative Region of China. Electronic address: czxu@um.edu.mo.