Attribute-Based Robotic Grasping with Data-Efficient Adaptation
Journal:
arXiv
Published Date:
Jan 4, 2025
Abstract
Robotic grasping is one of the most fundamental robotic manipulation tasks
and has been the subject of extensive research. However, swiftly teaching a
robot to grasp a novel target object in clutter remains challenging. This paper
attempts to address the challenge by leveraging object attributes that
facilitate recognition, grasping, and rapid adaptation to new domains. In this
work, we present an end-to-end encoder-decoder network to learn attribute-based
robotic grasping with data-efficient adaptation capability. We first pre-train
the end-to-end model with a variety of basic objects to learn generic attribute
representation for recognition and grasping. Our approach fuses the embeddings
of a workspace image and a query text using a gated-attention mechanism and
learns to predict instance grasping affordances. To train the joint embedding
space of visual and textual attributes, the robot utilizes object persistence
before and after grasping. Our model is self-supervised in a simulation that
only uses basic objects of various colors and shapes but generalizes to novel
objects in new environments. To further facilitate generalization, we propose
two adaptation methods, adversarial adaption and one-grasp adaptation.
Adversarial adaptation regulates the image encoder using augmented data of
unlabeled images, whereas one-grasp adaptation updates the overall end-to-end
model using augmented data from one grasp trial. Both adaptation methods are
data-efficient and considerably improve instance grasping performance.
Experimental results in both simulation and the real world demonstrate that our
approach achieves over 81% instance grasping success rate on unknown objects,
which outperforms several baselines by large margins.