Every Image Listens, Every Image Dances: Music-Driven Image Animation
Journal:
arXiv
Published Date:
Jan 30, 2025
Abstract
Image animation has become a promising area in multimodal research, with a
focus on generating videos from reference images. While prior work has largely
emphasized generic video generation guided by text, music-driven dance video
generation remains underexplored. In this paper, we introduce MuseDance, an
innovative end-to-end model that animates reference images using both music and
text inputs. This dual input enables MuseDance to generate personalized videos
that follow text descriptions and synchronize character movements with the
music. Unlike existing approaches, MuseDance eliminates the need for complex
motion guidance inputs, such as pose or depth sequences, making flexible and
creative video generation accessible to users of all expertise levels. To
advance research in this field, we present a new multimodal dataset comprising
2,904 dance videos with corresponding background music and text descriptions.
Our approach leverages diffusion-based methods to achieve robust
generalization, precise control, and temporal consistency, setting a new
baseline for the music-driven image animation task.