Tactile Beyond Pixels: Multisensory Touch Representations for Robot Manipulation

Journal: arXiv
Published Date:

Abstract

We present Sparsh-X, the first multisensory touch representations across four tactile modalities: image, audio, motion, and pressure. Trained on ~1M contact-rich interactions collected with the Digit 360 sensor, Sparsh-X captures complementary touch signals at diverse temporal and spatial scales. By leveraging self-supervised learning, Sparsh-X fuses these modalities into a unified representation that captures physical properties useful for robot manipulation tasks. We study how to effectively integrate real-world touch representations for both imitation learning and tactile adaptation of sim-trained policies, showing that Sparsh-X boosts policy success rates by 63% over an end-to-end model using tactile images and improves robustness by 90% in recovering object states from touch. Finally, we benchmark Sparsh-X ability to make inferences about physical properties, such as object-action identification, material-quantity estimation, and force estimation. Sparsh-X improves accuracy in characterizing physical properties by 48% compared to end-to-end approaches, demonstrating the advantages of multisensory pretraining for capturing features essential for dexterous manipulation.

Authors

  • Carolina Higuera
  • Akash Sharma
  • Taosha Fan
  • Chaithanya Krishna Bodduluri
  • Byron Boots
  • Michael Kaess
  • Mike Lambeta
  • Tingfan Wu
  • Zixi Liu
  • Francois Robert Hogan
  • Mustafa Mukadam

Categories