XY-Cut++: Advanced Layout Ordering via Hierarchical Mask Mechanism on a Novel Benchmark
Journal:
arXiv
Published Date:
Apr 14, 2025
Abstract
Document Reading Order Recovery is a fundamental task in document image
understanding, playing a pivotal role in enhancing Retrieval-Augmented
Generation (RAG) and serving as a critical preprocessing step for large
language models (LLMs). Existing methods often struggle with complex
layouts(e.g., multi-column newspapers), high-overhead interactions between
cross-modal elements (visual regions and textual semantics), and a lack of
robust evaluation benchmarks. We introduce XY-Cut++, an advanced layout
ordering method that integrates pre-mask processing, multi-granularity
segmentation, and cross-modal matching to address these challenges. Our method
significantly enhances layout ordering accuracy compared to traditional XY-Cut
techniques. Specifically, XY-Cut++ achieves state-of-the-art performance (98.8
BLEU overall) while maintaining simplicity and efficiency. It outperforms
existing baselines by up to 24\% and demonstrates consistent accuracy across
simple and complex layouts on the newly introduced DocBench-100 dataset. This
advancement establishes a reliable foundation for document structure recovery,
setting a new standard for layout ordering tasks and facilitating more
effective RAG and LLM preprocessing.