Task Memory Engine: Spatial Memory for Robust Multi-Step LLM Agents
Journal:
arXiv
Published Date:
May 26, 2025
Abstract
Large Language Models (LLMs) falter in multi-step interactions -- often
hallucinating, repeating actions, or misinterpreting user corrections -- due to
reliance on linear, unstructured context. This fragility stems from the lack of
persistent memory to track evolving goals and task dependencies, undermining
trust in autonomous agents. We introduce the Task Memory Engine (TME), a
modular memory controller that transforms existing LLMs into robust,
revision-aware agents without fine-tuning. TME implements a spatial memory
framework that replaces flat context with graph-based structures to support
consistent, multi-turn reasoning. Departing from linear concatenation and
ReAct-style prompting, TME builds a dynamic task graph -- either a tree or
directed acyclic graph (DAG) -- to map user inputs to subtasks, align them with
prior context, and enable dependency-tracked revisions. Its Task Representation
and Intent Management (TRIM) component models task semantics and user intent to
ensure accurate interpretation. Across four multi-turn scenarios-trip planning,
cooking, meeting scheduling, and shopping cart editing -- TME eliminates 100%
of hallucinations and misinterpretations in three tasks, and reduces
hallucinations by 66.7% and misinterpretations by 83.3% across 27 user turns,
outperforming ReAct. TME's modular design supports plug-and-play deployment and
domain-specific customization, adaptable to both personal assistants and
enterprise automation. We release TME's codebase, benchmarks, and components as
open-source resources, enabling researchers to develop reliable LLM agents.
TME's scalable architecture addresses a critical gap in agent performance
across complex, interactive settings.