HybridGen: VLM-Guided Hybrid Planning for Scalable Data Generation of Imitation Learning

Journal: arXiv

Published Date: Mar 17, 2025

Abstract

The acquisition of large-scale and diverse demonstration data are essential for improving robotic imitation learning generalization. However, generating such data for complex manipulations is challenging in real-world settings. We introduce HybridGen, an automated framework that integrates Vision-Language Model (VLM) and hybrid planning. HybridGen uses a two-stage pipeline: first, VLM to parse expert demonstrations, decomposing tasks into expert-dependent (object-centric pose transformations for precise control) and plannable segments (synthesizing diverse trajectories via path planning); second, pose transformations substantially expand the first-stage data. Crucially, HybridGen generates a large volume of training data without requiring specific data formats, making it broadly applicable to a wide range of imitation learning algorithms, a characteristic which we also demonstrate empirically across multiple algorithms. Evaluations across seven tasks and their variants demonstrate that agents trained with HybridGen achieve substantial performance and generalization gains, averaging a 5% improvement over state-of-the-art methods. Notably, in the most challenging task variants, HybridGen achieves significant improvement, reaching a 59.7% average success rate, significantly outperforming Mimicgen's 49.5%. These results demonstrating its effectiveness and practicality.

Authors

Wensheng Wang
Ning Tan

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2503.13171v1)

HybridGen: VLM-Guided Hybrid Planning for Scalable Data Generation of Imitation Learning

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

HybridGen: VLM-Guided Hybrid Planning for Scalable Data Generation of Imitation Learning

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals