EdgeWisePersona: A Dataset for On-Device User Profiling from Natural Language Interactions
Journal:
arXiv
Published Date:
May 16, 2025
Abstract
This paper introduces a novel dataset and evaluation benchmark designed to
assess and improve small language models deployable on edge devices, with a
focus on user profiling from multi-session natural language interactions in
smart home environments. At the core of the dataset are structured user
profiles, each defined by a set of routines - context-triggered, repeatable
patterns of behavior that govern how users interact with their home systems.
Using these profiles as input, a large language model (LLM) generates
corresponding interaction sessions that simulate realistic, diverse, and
context-aware dialogues between users and their devices.
The primary task supported by this dataset is profile reconstruction:
inferring user routines and preferences solely from interactions history. To
assess how well current models can perform this task under realistic
conditions, we benchmarked several state-of-the-art compact language models and
compared their performance against large foundation models. Our results show
that while small models demonstrate some capability in reconstructing profiles,
they still fall significantly short of large models in accurately capturing
user behavior. This performance gap poses a major challenge - particularly
because on-device processing offers critical advantages, such as preserving
user privacy, minimizing latency, and enabling personalized experiences without
reliance on the cloud. By providing a realistic, structured testbed for
developing and evaluating behavioral modeling under these constraints, our
dataset represents a key step toward enabling intelligent, privacy-respecting
AI systems that learn and adapt directly on user-owned devices.