A Grounded Memory System For Smart Personal Assistants
Journal:
arXiv
Published Date:
May 9, 2025
Abstract
A wide variety of agentic AI applications - ranging from cognitive assistants
for dementia patients to robotics - demand a robust memory system grounded in
reality. In this paper, we propose such a memory system consisting of three
components. First, we combine Vision Language Models for image captioning and
entity disambiguation with Large Language Models for consistent information
extraction during perception. Second, the extracted information is represented
in a memory consisting of a knowledge graph enhanced by vector embeddings to
efficiently manage relational information. Third, we combine semantic search
and graph query generation for question answering via Retrieval Augmented
Generation. We illustrate the system's working and potential using a real-world
example.