MaskAttn-UNet: A Mask Attention-Driven Framework for Universal Low-Resolution Image Segmentation
Journal:
arXiv
Published Date:
Mar 11, 2025
Abstract
Low-resolution image segmentation is crucial in real-world applications such
as robotics, augmented reality, and large-scale scene understanding, where
high-resolution data is often unavailable due to computational constraints. To
address this challenge, we propose MaskAttn-UNet, a novel segmentation
framework that enhances the traditional U-Net architecture via a mask attention
mechanism. Our model selectively emphasizes important regions while suppressing
irrelevant backgrounds, thereby improving segmentation accuracy in cluttered
and complex scenes. Unlike conventional U-Net variants, MaskAttn-UNet
effectively balances local feature extraction with broader contextual
awareness, making it particularly well-suited for low-resolution inputs. We
evaluate our approach on three benchmark datasets with input images rescaled to
128x128 and demonstrate competitive performance across semantic, instance, and
panoptic segmentation tasks. Our results show that MaskAttn-UNet achieves
accuracy comparable to state-of-the-art methods at significantly lower
computational cost than transformer-based models, making it an efficient and
scalable solution for low-resolution segmentation in resource-constrained
scenarios.