Multi-scale feature pyramid network with bidirectional attention for efficient mural image classification.

Journal: PloS one
Published Date:

Abstract

Mural image recognition plays a critical role in the digital preservation of cultural heritage; however, it faces cross-cultural and multi-period style generalization challenges, compounded by limited sample sizes and intricate details, such as losses caused by natural weathering of mural surfaces and complex artistic patterns.This paper proposes a deep learning model based on DenseNet201-FPN, incorporating a Bidirectional Convolutional Block Attention Module (Bi-CBAM), dynamic focal distillation loss, and convex regularization. First, a lightweight Feature Pyramid Network (FPN) is embedded into DenseNet201 to fuse multi-scale texture features (28 × 28 × 256, 14 × 14 × 512, 7 × 7 × 1024). Second, a bidirectional LSTM-driven attention module iteratively optimizes channel and spatial weights, enhancing detail perception for low-frequency categories. Third, a dynamic temperature distillation strategy (T = 3 → 1) balances supervision from teacher models (ResNeXt101) and ground truth, improving the F1-score of rare classes by 6.1%. Experimental results on a self-constructed mural dataset (2,000 images,26 subcategories.) demonstrate 87.9% accuracy (+3.7% over DenseNet201) and real-time inference on edge devices (63ms/frame at 8.1W on Jetson TX2). This study provides a cost-effective solution for large-scale mural digitization in resource-constrained environments.

Authors

  • Shulan Wang
    School of Architecture and Art Design, Hebei University of Technology, Tianjin, China.
  • Siyu Liu
    Citromax Flavors Group, Inc., 444 Washington Ave, Carlstadt, NJ, 07072, USA.
  • Mengting Jin
    School of Information and Artificial Intelligence, Anhui Business College, Anhui, China.
  • Pingmei Fan
    School of Business Administration, Guangxi Vocational Normal University, Guangxi, China.