Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action Correction
Journal:
arXiv
Published Date:
Apr 20, 2025
Abstract
Building a generalizable self-correction system is crucial for robots to
recover from failures. Despite advancements in Multimodal Large Language Models
(MLLMs) that empower robots with semantic reflection ability for failure,
translating semantic reflection into how to correct fine-grained robotic
actions remains a significant challenge. To address this gap, we build the
Phoenix framework, which leverages motion instruction as a bridge to connect
high-level semantic reflection with low-level robotic action correction. In
this motion-based self-reflection framework, we start with a dual-process
motion adjustment mechanism with MLLMs to translate the semantic reflection
into coarse-grained motion instruction adjustment. To leverage this motion
instruction for guiding how to correct fine-grained robotic actions, a
multi-task motion-conditioned diffusion policy is proposed to integrate visual
observations for high-frequency robotic action correction. By combining these
two models, we could shift the demand for generalization capability from the
low-level manipulation policy to the MLLMs-driven motion adjustment model and
facilitate precise, fine-grained robotic action correction. Utilizing this
framework, we further develop a lifelong learning method to automatically
improve the model's capability from interactions with dynamic environments. The
experiments conducted in both the RoboMimic simulation and real-world scenarios
prove the superior generalization and robustness of our framework across a
variety of manipulation tasks. Our code is released at
\href{https://github.com/GeWu-Lab/Motion-based-Self-Reflection-Framework}{https://github.com/GeWu-Lab/Motion-based-Self-Reflection-Framework}.