An Efficient and Mixed Heterogeneous Model for Image Restoration
Journal:
arXiv
Published Date:
Apr 15, 2025
Abstract
Image restoration~(IR), as a fundamental multimedia data processing task, has
a significant impact on downstream visual applications. In recent years,
researchers have focused on developing general-purpose IR models capable of
handling diverse degradation types, thereby reducing the cost and complexity of
model development. Current mainstream approaches are based on three
architectural paradigms: CNNs, Transformers, and Mambas. CNNs excel in
efficient inference, whereas Transformers and Mamba excel at capturing
long-range dependencies and modeling global contexts. While each architecture
has demonstrated success in specialized, single-task settings, limited efforts
have been made to effectively integrate heterogeneous architectures to jointly
address diverse IR challenges. To bridge this gap, we propose RestorMixer, an
efficient and general-purpose IR model based on mixed-architecture fusion.
RestorMixer adopts a three-stage encoder-decoder structure, where each stage is
tailored to the resolution and feature characteristics of the input. In the
initial high-resolution stage, CNN-based blocks are employed to rapidly extract
shallow local features. In the subsequent stages, we integrate a refined
multi-directional scanning Mamba module with a multi-scale window-based
self-attention mechanism. This hierarchical and adaptive design enables the
model to leverage the strengths of CNNs in local feature extraction, Mamba in
global context modeling, and attention mechanisms in dynamic feature
refinement. Extensive experimental results demonstrate that RestorMixer
achieves leading performance across multiple IR tasks while maintaining high
inference efficiency. The official code can be accessed at
https://github.com/ClimBin/RestorMixer.