TextMesh4D: High-Quality Text-to-4D Mesh Generation
Journal:
arXiv
Published Date:
Jun 30, 2025
Abstract
Recent advancements in diffusion generative models significantly advanced
image, video, and 3D content creation from user-provided text prompts. However,
the challenging problem of dynamic 3D content generation (text-to-4D) with
diffusion guidance remains largely unexplored. In this paper, we introduce
TextMesh4D, a novel framework for high-quality text-to-4D generation. Our
approach leverages per-face Jacobians as a differentiable mesh representation
and decomposes 4D generation into two stages: static object creation and
dynamic motion synthesis. We further propose a flexibility-rigidity
regularization term to stabilize Jacobian optimization under video diffusion
priors, ensuring robust geometric performance. Experiments demonstrate that
TextMesh4D achieves state-of-the-art results in terms of temporal consistency,
structural fidelity, and visual realism. Moreover, TextMesh4D operates with a
low GPU memory overhead-requiring only a single 24GB GPU-offering a
cost-effective yet high-quality solution for text-driven 4D mesh generation.
The code will be released to facilitate future research in text-to-4D
generation.