ViTaSCOPE: Visuo-tactile Implicit Representation for In-hand Pose and Extrinsic Contact Estimation
Journal:
arXiv
Published Date:
Jun 13, 2025
Abstract
Mastering dexterous, contact-rich object manipulation demands precise
estimation of both in-hand object poses and external contact
locations$\unicode{x2013}$tasks particularly challenging due to partial and
noisy observations. We present ViTaSCOPE: Visuo-Tactile Simultaneous Contact
and Object Pose Estimation, an object-centric neural implicit representation
that fuses vision and high-resolution tactile feedback. By representing objects
as signed distance fields and distributed tactile feedback as neural shear
fields, ViTaSCOPE accurately localizes objects and registers extrinsic contacts
onto their 3D geometry as contact fields. Our method enables seamless reasoning
over complementary visuo-tactile cues by leveraging simulation for scalable
training and zero-shot transfers to the real-world by bridging the sim-to-real
gap. We evaluate our method through comprehensive simulated and real-world
experiments, demonstrating its capabilities in dexterous manipulation
scenarios.