Temporal-Guided Spiking Neural Networks for Event-Based Human Action Recognition
Journal:
arXiv
Published Date:
Mar 21, 2025
Abstract
This paper explores the promising interplay between spiking neural networks
(SNNs) and event-based cameras for privacy-preserving human action recognition
(HAR). The unique feature of event cameras in capturing only the outlines of
motion, combined with SNNs' proficiency in processing spatiotemporal data
through spikes, establishes a highly synergistic compatibility for event-based
HAR. Previous studies, however, have been limited by SNNs' ability to process
long-term temporal information, essential for precise HAR. In this paper, we
introduce two novel frameworks to address this: temporal segment-based SNN
(\textit{TS-SNN}) and 3D convolutional SNN (\textit{3D-SNN}). The
\textit{TS-SNN} extracts long-term temporal information by dividing actions
into shorter segments, while the \textit{3D-SNN} replaces 2D spatial elements
with 3D components to facilitate the transmission of temporal information. To
promote further research in event-based HAR, we create a dataset,
\textit{FallingDetection-CeleX}, collected using the high-resolution CeleX-V
event camera $(1280 \times 800)$, comprising 7 distinct actions. Extensive
experimental results show that our proposed frameworks surpass state-of-the-art
SNN methods on our newly collected dataset and three other neuromorphic
datasets, showcasing their effectiveness in handling long-range temporal
information for event-based HAR.