Ultra-High Resolution Segmentation via Boundary-Enhanced Patch-Merging Transformer
Journal:
arXiv
Published Date:
Dec 13, 2024
Abstract
Segmentation of ultra-high resolution (UHR) images is a critical task with
numerous applications, yet it poses significant challenges due to high spatial
resolution and rich fine details. Recent approaches adopt a dual-branch
architecture, where a global branch learns long-range contextual information
and a local branch captures fine details. However, they struggle to handle the
conflict between global and local information while adding significant extra
computational cost. Inspired by the human visual system's ability to rapidly
orient attention to important areas with fine details and filter out irrelevant
information, we propose a novel UHR segmentation method called
Boundary-enhanced Patch-merging Transformer (BPT). BPT consists of two key
components: (1) Patch-Merging Transformer (PMT) for dynamically allocating
tokens to informative regions to acquire global and local representations, and
(2) Boundary-Enhanced Module (BEM) that leverages boundary information to
enrich fine details. Extensive experiments on multiple UHR image segmentation
benchmarks demonstrate that our BPT outperforms previous state-of-the-art
methods without introducing extra computational overhead. Codes will be
released to facilitate research.