A simulated dataset for proactive robot task inference from streaming natural language dialogues.

Journal: Scientific data
Published Date:

Abstract

This paper introduces a dataset designed to support research on proactive robots that infer human needs from natural language conversations. Unlike traditional human-robot interaction datasets focused on explicit commands, this dataset captures implicit task requests within multi-party dialogues. It simulates realistic workplace environments, spanning 10 diverse scenarios, such as biotechnology research centers, legal consulting firms, and game development studios, among others. The dataset includes 10,000 synthetic dialogues generated using a large language model-based pipeline, covering a wide range of topics, including task-related discussions and casual conversations. The dataset focuses on common workplace tasks, such as borrowing, distributing, and processing items. It provides a resource for advancing proactive robotic systems, enabling research in natural language understanding, intent recognition, and autonomous task inference.

Authors

  • Haifeng Xu
  • Chunwen Li
    Department of Automation, Tsinghua University, Beijing, China.
  • Xiaohu Yuan
    Department of Computer Science and Technology, Tsinghua University, Beijing, China.
  • Tao Zhi
    Beijing Yunji Technology Co., Ltd., Beijing, China.
  • Huaping Liu
    School of Nursing, Peking Union Medical College, Beijing, China.