Comparative Analysis of Image, Video, and Audio Classifiers for Automated News Video Segmentation
Journal:
arXiv
Published Date:
Mar 27, 2025
Abstract
News videos require efficient content organisation and retrieval systems, but
their unstructured nature poses significant challenges for automated
processing. This paper presents a comprehensive comparative analysis of image,
video, and audio classifiers for automated news video segmentation. This work
presents the development and evaluation of multiple deep learning approaches,
including ResNet, ViViT, AST, and multimodal architectures, to classify five
distinct segment types: advertisements, stories, studio scenes, transitions,
and visualisations. Using a custom-annotated dataset of 41 news videos
comprising 1,832 scene clips, our experiments demonstrate that image-based
classifiers achieve superior performance (84.34\% accuracy) compared to more
complex temporal models. Notably, the ResNet architecture outperformed
state-of-the-art video classifiers while requiring significantly fewer
computational resources. Binary classification models achieved high accuracy
for transitions (94.23\%) and advertisements (92.74\%). These findings advance
the understanding of effective architectures for news video segmentation and
provide practical insights for implementing automated content organisation
systems in media applications. These include media archiving, personalised
content delivery, and intelligent video search.