Hierarchical task network-enhanced multi-agent reinforcement learning: Toward efficient cooperative strategies.

Journal: Neural networks : the official journal of the International Neural Network Society
PMID:

Abstract

Navigating multi-agent reinforcement learning (MARL) environments with sparse rewards is notoriously difficult, particularly in suboptimal settings where exploration can be prematurely halted. To tackle these challenges, we introduce Hierarchical Symbolic Multi-Agent Reinforcement Learning (HS-MARL), a novel approach that incorporates hierarchical knowledge into MARL to effectively reduce the exploration space. We design intermediate states to decompose the state space into a hierarchical structure, represented using the Hierarchical Domain Definition Language (HDDL) and the option framework, forming domain knowledge and a symbolic option set. We leverage pyHIPOP+, an enhanced hierarchical task network (HTN) planner, to generate action sequences. A high-level meta-controller then assigns these symbolic options as policy functions, guiding low-level agents in their exploration of the environment. During this process, the meta-controller computes intrinsic rewards from the environmental rewards collected, which are used to train the symbolic option policies and refine pyHIPOP+'s heuristic function, thereby optimizing future action sequences. We evaluate HS-MARL with comparison to 15 state-of-the-art algorithms across two types of environments: four with sparse rewards and suboptimal conditions, and a real-world scenario involving a football match. Additionally, we perform an ablation study on HS-MARL's intrinsic reward mechanism and pyHIPOP+, along with a sensitivity analysis of intrinsic reward hyperparameters. Our results show that HS-MARL significantly outperforms other methods in environments with sparse rewards and suboptimal conditions, underscoring the critical role of its intrinsic reward design and the pyHIPOP+ component. The code is available at: https://github.com/Mxc666/HS-MARL.git.

Authors

  • Xuechen Mu
    School of Mathematics, Jilin University, Changchun 130012, China.
  • Hankz Hankui Zhuo
    School of Artificial Intelligence, Nanjing University, Nanjing, 210023, China; School of Computer Science and Engineering, Sun Yat-sen University, Guangzhou, 510275, China. Electronic address: hankz@nju.edu.cn.
  • Chen Chen
    The George Institute for Global Health, Faculty of Medicine, University of New South Wales, Sydney, NSW, Australia.
  • Kai Zhang
    Anhui Province Key Laboratory of Respiratory Tumor and Infectious Disease, First Affiliated Hospital of Bengbu Medical University, Bengbu, China.
  • Chao Yu
    Link Sense Laboratory, Nanjing Research Institute of Electronic Technology, Nanjing, China.
  • Jianye Hao
    College of Intelligence and Computing, Tianjin University, Peiyang Park Campus: No.135 Yaguan Road, Haihe Education Park, Tianjin, 300350, China. haojianye@gmail.com.