Articulate That Object Part (ATOP): 3D Part Articulation via Text and Motion Personalization
Journal:
arXiv
Published Date:
Feb 11, 2025
Abstract
We present ATOP (Articulate That Object Part), a novel few-shot method based
on motion personalization to articulate a static 3D object with respect to a
part and its motion as prescribed in a text prompt. Given the scarcity of
available datasets with motion attribute annotations, existing methods struggle
to generalize well in this task. In our work, the text input allows us to tap
into the power of modern-day diffusion models to generate plausible motion
samples for the right object category and part. In turn, the input 3D object
provides image prompting to personalize the generated video to that very object
we wish to articulate. Our method starts with a few-shot finetuning for
category-specific motion generation, a key first step to compensate for the
lack of articulation awareness by current diffusion models. For this, we
finetune a pre-trained multi-view image generation model for controllable
multi-view video generation, using a small collection of video samples obtained
for the target object category. This is followed by motion video
personalization that is realized by multi-view rendered images of the target 3D
object. At last, we transfer the personalized video motion to the target 3D
object via differentiable rendering to optimize part motion parameters by a
score distillation sampling loss. Experimental results on PartNet-Sapien and
ACD datasets show that our method is capable of generating realistic motion
videos and predicting 3D motion parameters in a more accurate and generalizable
way, compared to prior works in the few-shot setting.