MAPLE: Encoding Dexterous Robotic Manipulation Priors Learned From Egocentric Videos
Journal:
arXiv
Published Date:
Apr 8, 2025
Abstract
Large-scale egocentric video datasets capture diverse human activities across
a wide range of scenarios, offering rich and detailed insights into how humans
interact with objects, especially those that require fine-grained dexterous
control. Such complex, dexterous skills with precise controls are crucial for
many robotic manipulation tasks, yet are often insufficiently addressed by
traditional data-driven approaches to robotic manipulation. To address this
gap, we leverage manipulation priors learned from large-scale egocentric video
datasets to improve policy learning for dexterous robotic manipulation tasks.
We present MAPLE, a novel method for dexterous robotic manipulation that
exploits rich manipulation priors to enable efficient policy learning and
better performance on diverse, complex manipulation tasks. Specifically, we
predict hand-object contact points and detailed hand poses at the moment of
hand-object contact and use the learned features to train policies for
downstream manipulation tasks. Experimental results demonstrate the
effectiveness of MAPLE across existing simulation benchmarks, as well as a
newly designed set of challenging simulation tasks, which require fine-grained
object control and complex dexterous skills. The benefits of MAPLE are further
highlighted in real-world experiments using a dexterous robotic hand, whereas
simultaneous evaluation across both simulation and real-world experiments has
remained underexplored in prior work.