MAGIC: Motion-Aware Generative Inference via Confidence-Guided LLM
Journal:
arXiv
Published Date:
May 22, 2025
Abstract
Recent advances in static 3D generation have intensified the demand for
physically consistent dynamic 3D content. However, existing video generation
models, including diffusion-based methods, often prioritize visual realism
while neglecting physical plausibility, resulting in implausible object
dynamics. Prior approaches for physics-aware dynamic generation typically rely
on large-scale annotated datasets or extensive model fine-tuning, which imposes
significant computational and data collection burdens and limits scalability
across scenarios. To address these challenges, we present MAGIC, a
training-free framework for single-image physical property inference and
dynamic generation, integrating pretrained image-to-video diffusion models with
iterative LLM-based reasoning. Our framework generates motion-rich videos from
a static image and closes the visual-to-physical gap through a
confidence-driven LLM feedback loop that adaptively steers the diffusion model
toward physics-relevant motion. To translate visual dynamics into controllable
physical behavior, we further introduce a differentiable MPM simulator
operating directly on 3D Gaussians reconstructed from the single image,
enabling physically grounded, simulation-ready outputs without any supervision
or model tuning. Experiments show that MAGIC outperforms existing physics-aware
generative methods in inference accuracy and achieves greater temporal
coherence than state-of-the-art video diffusion models.