FreePCA: Integrating Consistency Information across Long-short Frames in Training-free Long Video Generation via Principal Component Analysis
Journal:
arXiv
Published Date:
May 2, 2025
Abstract
Long video generation involves generating extended videos using models
trained on short videos, suffering from distribution shifts due to varying
frame counts. It necessitates the use of local information from the original
short frames to enhance visual and motion quality, and global information from
the entire long frames to ensure appearance consistency. Existing training-free
methods struggle to effectively integrate the benefits of both, as appearance
and motion in videos are closely coupled, leading to motion inconsistency and
visual quality. In this paper, we reveal that global and local information can
be precisely decoupled into consistent appearance and motion intensity
information by applying Principal Component Analysis (PCA), allowing for
refined complementary integration of global consistency and local quality. With
this insight, we propose FreePCA, a training-free long video generation
paradigm based on PCA that simultaneously achieves high consistency and
quality. Concretely, we decouple consistent appearance and motion intensity
features by measuring cosine similarity in the principal component space.
Critically, we progressively integrate these features to preserve original
quality and ensure smooth transitions, while further enhancing consistency by
reusing the mean statistics of the initial noise. Experiments demonstrate that
FreePCA can be applied to various video diffusion models without requiring
training, leading to substantial improvements. Code is available at
https://github.com/JosephTiTan/FreePCA.