Low-cost video-based air quality estimation system using structured deep learning with selective state space modeling.
Journal:
Environment international
Published Date:
Apr 26, 2025
Abstract
Air quality is crucial for both public health and environmental sustainability. An efficient and cost-effective model is essential for accurate air quality predictions and proactive pollution control. However, existing research primarily focuses on single static image analysis, which does not account for the dynamic and temporal nature of air pollution. Meanwhile, research on video-based air quality estimation remains limited, particularly in achieving accurate multi-pollutant outputs. This study proposes Air Quality Prediction-Mamba (AQP-Mamba), a video-based deep learning model that integrates a structured Selective State Space Model (SSM) with a selective scan mechanism and a hybrid predictor (HP) to estimate air quality. The spatiotemporal forward and backward SSM dynamically adjusts parameters based on input, ensures linear complexity, and effectively captures long-range dependencies by bidirectional processing of spatiotemporal features through four scanning techniques (row-wise, column-wise, and their vertical reversals), which allows the model to accurately track pollutant concentrations and air quality variations over time. Thus, the model efficiently extracts spatiotemporal features from video and simultaneously performs regression (PM, PM, and AQI), and classification (AQI) tasks, respectively. A high-quality outdoor hourly air quality dataset (LMSAQV) with 13,176 videos collected from six monitoring stations in Lahore, Pakistan, was utilized as the case study. The experimental results demonstrate that the AQP-Mamba significantly outperforms several state-of-the-art models, including VideoSwin-T, VideoMAE, I3D, VTHCL, andTimeSformer. The proposed model achieves strong regression performance (PM: R = 0.91, PM: R = 0.90, AQI: R = 0.92) and excellent classification metrics: accuracy (94.57 %), precision (93.86 %), recall (94.20 %), and F1-score (93.44 %), respectively. The proposed model delivers consistent, real-time performance with a latency of 1.98 s per video, offering an effective, scalable, and cost-efficient solution for multi-pollutant estimation. This approach has the potential to address gaps in air quality data collected by expensive instruments globally.