Low-cost video-based air quality estimation system using structured deep learning with selective state space modeling.

Journal: Environment international
Published Date:

Abstract

Air quality is crucial for both public health and environmental sustainability. An efficient and cost-effective model is essential for accurate air quality predictions and proactive pollution control. However, existing research primarily focuses on single static image analysis, which does not account for the dynamic and temporal nature of air pollution. Meanwhile, research on video-based air quality estimation remains limited, particularly in achieving accurate multi-pollutant outputs. This study proposes Air Quality Prediction-Mamba (AQP-Mamba), a video-based deep learning model that integrates a structured Selective State Space Model (SSM) with a selective scan mechanism and a hybrid predictor (HP) to estimate air quality. The spatiotemporal forward and backward SSM dynamically adjusts parameters based on input, ensures linear complexity, and effectively captures long-range dependencies by bidirectional processing of spatiotemporal features through four scanning techniques (row-wise, column-wise, and their vertical reversals), which allows the model to accurately track pollutant concentrations and air quality variations over time. Thus, the model efficiently extracts spatiotemporal features from video and simultaneously performs regression (PM, PM, and AQI), and classification (AQI) tasks, respectively. A high-quality outdoor hourly air quality dataset (LMSAQV) with 13,176 videos collected from six monitoring stations in Lahore, Pakistan, was utilized as the case study. The experimental results demonstrate that the AQP-Mamba significantly outperforms several state-of-the-art models, including VideoSwin-T, VideoMAE, I3D, VTHCL, andTimeSformer. The proposed model achieves strong regression performance (PM: R = 0.91, PM: R = 0.90, AQI: R = 0.92) and excellent classification metrics: accuracy (94.57 %), precision (93.86 %), recall (94.20 %), and F1-score (93.44 %), respectively. The proposed model delivers consistent, real-time performance with a latency of 1.98 s per video, offering an effective, scalable, and cost-efficient solution for multi-pollutant estimation. This approach has the potential to address gaps in air quality data collected by expensive instruments globally.

Authors

  • Maqsood Ahmed
    School of Geography and Information Engineering, China University of Geosciences, Wuhan, 430074, China.
  • Xiang Zhang
    Department of Orthopedics, Orthopedic Research Institute, West China Hospital, Sichuan University, Chengdu, Sichuan, China.
  • Yonglin Shen
    National Engineering Research Center of Geographic Information System, China University of Geosciences, Wuhan, 430074, China.
  • Tanveer Ahmed
  • Shahid Ali
    Department of Speech Language and Hearing Sciences, Faculty of Health Sciences, Ziauddin University, Karachi, Pakistan.
  • Ayaz Ali
    Department of Cybernetics, Nanotechnology and Data Processing, Faculty of Automatic Control, Electronics and Computer Science, Silesian University of Technology, Akademicka 16, 44-100 Gliwice, Poland.
  • Aminjon Gulakhmadov
    Research Center of Ecology and Environment in Central Asia, Xinjiang Institute of Ecology and Geography, Chinese Academy of Sciences, Urumqi 830011, China; Institute of Water Problems, Hydropower and Ecology of the National Academy of Sciences of Tajikistan, Dushanbe 734042, Tajikistan.
  • Won-Ho Nam
    School of Social Safety and Systems Engineering, Institute of Agricultural Environmental Science, National Agricultural Water Research Center, Hankyong National University, Anseong, Republic of Korea.
  • Nengcheng Chen
    National Engineering Research Center of Geographic Information System, School of Geography and Information Engineering, China University of Geosciences, Wuhan 430074, China.