A Benchmark Dataset and a Framework for Urdu Multimodal Named Entity Recognition

Journal: arXiv

Published Date: May 8, 2025

Abstract

The emergence of multimodal content, particularly text and images on social media, has positioned Multimodal Named Entity Recognition (MNER) as an increasingly important area of research within Natural Language Processing. Despite progress in high-resource languages such as English, MNER remains underexplored for low-resource languages like Urdu. The primary challenges include the scarcity of annotated multimodal datasets and the lack of standardized baselines. To address these challenges, we introduce the U-MNER framework and release the Twitter2015-Urdu dataset, a pioneering resource for Urdu MNER. Adapted from the widely used Twitter2015 dataset, it is annotated with Urdu-specific grammar rules. We establish benchmark baselines by evaluating both text-based and multimodal models on this dataset, providing comparative analyses to support future research on Urdu MNER. The U-MNER framework integrates textual and visual context using Urdu-BERT for text embeddings and ResNet for visual feature extraction, with a Cross-Modal Fusion Module to align and fuse information. Our model achieves state-of-the-art performance on the Twitter2015-Urdu dataset, laying the groundwork for further MNER research in low-resource languages.

Authors

Hussain Ahmad
Qingyang Zeng
Jing Wan

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2505.05148v1)

A Benchmark Dataset and a Framework for Urdu Multimodal Named Entity Recognition

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

A Benchmark Dataset and a Framework for Urdu Multimodal Named Entity Recognition

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals