TriNER: A Series of Named Entity Recognition Models For Hindi, Bengali & Marathi
Journal:
arXiv
Published Date:
Feb 6, 2025
Abstract
India's rich cultural and linguistic diversity poses various challenges in
the domain of Natural Language Processing (NLP), particularly in Named Entity
Recognition (NER). NER is a NLP task that aims to identify and classify tokens
into different entity groups like Person, Location, Organization, Number, etc.
This makes NER very useful for downstream tasks like context-aware
anonymization. This paper details our work to build a multilingual NER model
for the three most spoken languages in India - Hindi, Bengali & Marathi. We
train a custom transformer model and fine tune a few pretrained models,
achieving an F1 Score of 92.11 for a total of 6 entity groups. Through this
paper, we aim to introduce a single model to perform NER and significantly
reduce the inconsistencies in entity groups and tag names, across the three
languages.