Multi-Keypoint Affordance Representation for Functional Dexterous Grasping
Journal:
arXiv
Published Date:
Feb 27, 2025
Abstract
Functional dexterous grasping requires precise hand-object interaction, going
beyond simple gripping. Existing affordance-based methods primarily predict
coarse interaction regions and cannot directly constrain the grasping posture,
leading to a disconnection between visual perception and manipulation. To
address this issue, we propose a multi-keypoint affordance representation for
functional dexterous grasping, which directly encodes task-driven grasp
configurations by localizing functional contact points. Our method introduces
Contact-guided Multi-Keypoint Affordance (CMKA), leveraging human grasping
experience images for weak supervision combined with Large Vision Models for
fine affordance feature extraction, achieving generalization while avoiding
manual keypoint annotations. Additionally, we present a Keypoint-based Grasp
matrix Transformation (KGT) method, ensuring spatial consistency between hand
keypoints and object contact points, thus providing a direct link between
visual perception and dexterous grasping actions. Experiments on public
real-world FAH datasets, IsaacGym simulation, and challenging robotic tasks
demonstrate that our method significantly improves affordance localization
accuracy, grasp consistency, and generalization to unseen tools and tasks,
bridging the gap between visual affordance learning and dexterous robotic
manipulation. The source code and demo videos will be publicly available at
https://github.com/PopeyePxx/MKA.