Advance Fake Video Detection via Vision Transformers
Journal:
arXiv
Published Date:
Apr 29, 2025
Abstract
Recent advancements in AI-based multimedia generation have enabled the
creation of hyper-realistic images and videos, raising concerns about their
potential use in spreading misinformation. The widespread accessibility of
generative techniques, which allow for the production of fake multimedia from
prompts or existing media, along with their continuous refinement, underscores
the urgent need for highly accurate and generalizable AI-generated media
detection methods, underlined also by new regulations like the European Digital
AI Act. In this paper, we draw inspiration from Vision Transformer (ViT)-based
fake image detection and extend this idea to video. We propose an {original}
%innovative framework that effectively integrates ViT embeddings over time to
enhance detection performance. Our method shows promising accuracy,
generalization, and few-shot learning capabilities across a new, large and
diverse dataset of videos generated using five open source generative
techniques from the state-of-the-art, as well as a separate dataset containing
videos produced by proprietary generative methods.