An explainable deep learning framework for video violence detection using unsupervised keyframe selection and attention-based CNN.
Journal:
Scientific reports
Published Date:
Feb 26, 2026
Abstract
The exponential growth of video data from surveillance and online platforms has heightened the demand for intelligent, explainable systems capable of detecting violence in real time. This study proposes a novel Explainable Attention-Enhanced Convolutional Neural Network (CNN) framework that integrates unsupervised keyframe selection, attention-driven feature learning, and Grad-CAM++-based interpretability to address redundancy, transparency, and generalization challenges in video violence detection. The proposed model automatically extracts representative keyframes using similarity-based clustering, reducing computational overhead while retaining essential temporal information. Attention modules are embedded within the CNN backbone to enhance spatial-temporal feature discrimination, while Grad-CAM + + provides interpretable visual insights into the model's decision process. Comprehensive experiments on five benchmark datasets-RLVS, Hockey Fight, Violent Flow, ShanghaiTech, and UCF-Crime-demonstrate that the framework achieves superior performance, with an average accuracy of 94.6% and F1-score of 93.9%, outperforming state-of-the-art models such as C3D, I3D, ResNet-LSTM, and ViViT. The model also delivers near-real-time efficiency (≈ 62 FPS) with reduced memory utilization (6.8 GB), confirming its suitability for deployment in surveillance and public safety systems. Statistical analysis using ANOVA and Tukey's HSD tests verified that keyframe selection and attention modules significantly improve performance (p < 0.05) with large effect sizes (η² = 0.76). The integration of interpretability further enhances reliability by localizing violence-relevant regions in frames. Overall, the proposed explainable framework establishes a robust, efficient, and transparent solution for automated violence detection in diverse real-world scenarios.
Authors
Keywords
No keywords available for this article.