Enhancing robotic skill acquisition with multimodal sensory data: A novel dataset for kitchen tasks.
Journal:
Scientific data
PMID:
40118852
Abstract
The advent of large language models has transformed human-robot interaction by enabling robots to execute tasks via natural language commands. However, these models primarily depend on unimodal data, which limits their ability to integrate diverse and essential environmental, physiological, and physical data. To address the limitations of current unimodal dataset problems, this paper investigates the novel and comprehensive multimodal data collection methodologies which can fully capture the complexity of human interaction in the complex real-world kitchen environments. Data related to the use of 17 different kitchen tools by 20 adults in dynamic scenarios were collected, including human tactile information, EMG signals, audio data, whole-body movement, and eye-tracking data. The dataset is comprised of 680 segments (~11 hours) with data across seven modalities and includes 56,000 detailed annotations. This paper bridges the gap between real-world multimodal data and embodied AI, paving the way for a new benchmark in utility and repeatability for skill learning in robotics areas.