Opioid Named Entity Recognition (ONER-2025) from Reddit
Journal:
arXiv
Published Date:
Mar 28, 2025
Abstract
The opioid overdose epidemic remains a critical public health crisis,
particularly in the United States, leading to significant mortality and
societal costs. Social media platforms like Reddit provide vast amounts of
unstructured data that offer insights into public perceptions, discussions, and
experiences related to opioid use. This study leverages Natural Language
Processing (NLP), specifically Opioid Named Entity Recognition (ONER-2025), to
extract actionable information from these platforms. Our research makes four
key contributions. First, we created a unique, manually annotated dataset
sourced from Reddit, where users share self-reported experiences of opioid use
via different administration routes. This dataset contains 331,285 tokens and
includes eight major opioid entity categories. Second, we detail our annotation
process and guidelines while discussing the challenges of labeling the
ONER-2025 dataset. Third, we analyze key linguistic challenges, including
slang, ambiguity, fragmented sentences, and emotionally charged language, in
opioid discussions. Fourth, we propose a real-time monitoring system to process
streaming data from social media, healthcare records, and emergency services to
identify overdose events. Using 5-fold cross-validation in 11 experiments, our
system integrates machine learning, deep learning, and transformer-based
language models with advanced contextual embeddings to enhance understanding.
Our transformer-based models (bert-base-NER and roberta-base) achieved 97%
accuracy and F1-score, outperforming baselines by 10.23% (RF=0.88).