Hierarchical task network-enhanced multi-agent reinforcement learning: Toward efficient cooperative strategies.
Journal:
Neural networks : the official journal of the International Neural Network Society
PMID:
39965524
Abstract
Navigating multi-agent reinforcement learning (MARL) environments with sparse rewards is notoriously difficult, particularly in suboptimal settings where exploration can be prematurely halted. To tackle these challenges, we introduce Hierarchical Symbolic Multi-Agent Reinforcement Learning (HS-MARL), a novel approach that incorporates hierarchical knowledge into MARL to effectively reduce the exploration space. We design intermediate states to decompose the state space into a hierarchical structure, represented using the Hierarchical Domain Definition Language (HDDL) and the option framework, forming domain knowledge and a symbolic option set. We leverage pyHIPOP+, an enhanced hierarchical task network (HTN) planner, to generate action sequences. A high-level meta-controller then assigns these symbolic options as policy functions, guiding low-level agents in their exploration of the environment. During this process, the meta-controller computes intrinsic rewards from the environmental rewards collected, which are used to train the symbolic option policies and refine pyHIPOP+'s heuristic function, thereby optimizing future action sequences. We evaluate HS-MARL with comparison to 15 state-of-the-art algorithms across two types of environments: four with sparse rewards and suboptimal conditions, and a real-world scenario involving a football match. Additionally, we perform an ablation study on HS-MARL's intrinsic reward mechanism and pyHIPOP+, along with a sensitivity analysis of intrinsic reward hyperparameters. Our results show that HS-MARL significantly outperforms other methods in environments with sparse rewards and suboptimal conditions, underscoring the critical role of its intrinsic reward design and the pyHIPOP+ component. The code is available at: https://github.com/Mxc666/HS-MARL.git.