HOIDiNi: Human-Object Interaction through Diffusion Noise Optimization
Journal:
arXiv
Published Date:
Jun 18, 2025
Abstract
We present HOIDiNi, a text-driven diffusion framework for synthesizing
realistic and plausible human-object interaction (HOI). HOI generation is
extremely challenging since it induces strict contact accuracies alongside a
diverse motion manifold. While current literature trades off between realism
and physical correctness, HOIDiNi optimizes directly in the noise space of a
pretrained diffusion model using Diffusion Noise Optimization (DNO), achieving
both. This is made feasible thanks to our observation that the problem can be
separated into two phases: an object-centric phase, primarily making discrete
choices of hand-object contact locations, and a human-centric phase that
refines the full-body motion to realize this blueprint. This structured
approach allows for precise hand-object contact without compromising motion
naturalness. Quantitative, qualitative, and subjective evaluations on the GRAB
dataset alone clearly indicate HOIDiNi outperforms prior works and baselines in
contact accuracy, physical validity, and overall quality. Our results
demonstrate the ability to generate complex, controllable interactions,
including grasping, placing, and full-body coordination, driven solely by
textual prompts. https://hoidini.github.io.