A simulated dataset for proactive robot task inference from streaming natural language dialogues.
Journal:
Scientific data
Published Date:
Aug 11, 2025
Abstract
This paper introduces a dataset designed to support research on proactive robots that infer human needs from natural language conversations. Unlike traditional human-robot interaction datasets focused on explicit commands, this dataset captures implicit task requests within multi-party dialogues. It simulates realistic workplace environments, spanning 10 diverse scenarios, such as biotechnology research centers, legal consulting firms, and game development studios, among others. The dataset includes 10,000 synthetic dialogues generated using a large language model-based pipeline, covering a wide range of topics, including task-related discussions and casual conversations. The dataset focuses on common workplace tasks, such as borrowing, distributing, and processing items. It provides a resource for advancing proactive robotic systems, enabling research in natural language understanding, intent recognition, and autonomous task inference.