ACT360: An Efficient 360-Degree Action Detection and Summarization Framework for Mission-Critical Training and Debriefing
Journal:
arXiv
Published Date:
Mar 17, 2025
Abstract
Effective training and debriefing are critical in high-stakes,
mission-critical environments such as disaster response, military simulations,
and industrial safety, where precision and minimizing errors are paramount. The
traditional post-training analysis relies on manually reviewing 2D videos, a
time-consuming process that lacks comprehensive situational awareness. To
address these limitations, we introduce ACT360, a system that leverages
360-degree videos and machine learning for automated action detection and
structured debriefing. ACT360 integrates 360YOWO, an enhanced You Only Watch
Once (YOWO) model with spatial attention and equirectangular-aware convolution
(EAC) to mitigate panoramic video distortions. To enable deployment in
resource-constrained environments, we apply quantization and model pruning,
reducing the model size by 74% while maintaining robust accuracy (mAP drop of
only 1.5%, from 0.865 to 0.850) and improving inference speed. We validate our
approach on a publicly available dataset of 55 labeled 360-degree videos
covering seven key operational actions, recorded across various real-world
training sessions and environmental conditions. Additionally, ACT360 integrates
360AIE (Action Insight Explorer), a web-based interface for automatic action
detection, retrieval, and textual summarization using large language models
(LLMs), significantly enhancing post-incident analysis efficiency. ACT360
serves as a generalized framework for mission-critical debriefing,
incorporating EAC, spatial attention, summarization, and model optimization.
These innovations apply to any training environment requiring lightweight
action detection and structured post-exercise analysis.