Every Pixel Tells a Story: End-to-End Urdu Newspaper OCR

Journal: arXiv

Published Date: May 20, 2025

Abstract

This paper introduces a comprehensive end-to-end pipeline for Optical Character Recognition (OCR) on Urdu newspapers. In our approach, we address the unique challenges of complex multi-column layouts, low-resolution archival scans, and diverse font styles. Our process decomposes the OCR task into four key modules: (1) article segmentation, (2) image super-resolution, (3) column segmentation, and (4) text recognition. For article segmentation, we fine-tune and evaluate YOLOv11x to identify and separate individual articles from cluttered layouts. Our model achieves a precision of 0.963 and mAP@50 of 0.975. For super-resolution, we fine-tune and benchmark the SwinIR model (reaching 32.71 dB PSNR) to enhance the quality of degraded newspaper scans. To do our column segmentation, we use YOLOv11x to separate columns in text to further enhance performance - this model reaches a precision of 0.970 and mAP@50 of 0.975. In the text recognition stage, we benchmark a range of LLMs from different families, including Gemini, GPT, Llama, and Claude. The lowest WER of 0.133 is achieved by Gemini-2.5-Pro.

Authors

Samee Arif
Sualeha Farid

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2505.13943v1)

Every Pixel Tells a Story: End-to-End Urdu Newspaper OCR

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Every Pixel Tells a Story: End-to-End Urdu Newspaper OCR

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals