Bringing RGB and IR Together: Hierarchical Multi-Modal Enhancement for Robust Transmission Line Detection
Journal:
arXiv
Published Date:
Jan 25, 2025
Abstract
Ensuring a stable power supply in rural areas relies heavily on effective
inspection of power equipment, particularly transmission lines (TLs). However,
detecting TLs from aerial imagery can be challenging when dealing with
misalignments between visible light (RGB) and infrared (IR) images, as well as
mismatched high- and low-level features in convolutional networks. To address
these limitations, we propose a novel Hierarchical Multi-Modal Enhancement
Network (HMMEN) that integrates RGB and IR data for robust and accurate TL
detection. Our method introduces two key components: (1) a Mutual Multi-Modal
Enhanced Block (MMEB), which fuses and enhances hierarchical RGB and IR feature
maps in a coarse-to-fine manner, and (2) a Feature Alignment Block (FAB) that
corrects misalignments between decoder outputs and IR feature maps by
leveraging deformable convolutions. We employ MobileNet-based encoders for both
RGB and IR inputs to accommodate edge-computing constraints and reduce
computational overhead. Experimental results on diverse weather and lighting
conditionsfog, night, snow, and daytimedemonstrate the superiority and
robustness of our approach compared to state-of-the-art methods, resulting in
fewer false positives, enhanced boundary delineation, and better overall
detection performance. This framework thus shows promise for practical
large-scale power line inspections with unmanned aerial vehicles.