Ensemble BERT for Medication Event Classification on Electronic Health Records (EHRs)
Journal:
arXiv
Published Date:
Jun 29, 2025
Abstract
Identification of key variables such as medications, diseases, relations from
health records and clinical notes has a wide range of applications in the
clinical domain. n2c2 2022 provided shared tasks on challenges in natural
language processing for clinical data analytics on electronic health records
(EHR), where it built a comprehensive annotated clinical data Contextualized
Medication Event Dataset (CMED). This study focuses on subtask 2 in Track 1 of
this challenge that is to detect and classify medication events from clinical
notes through building a novel BERT-based ensemble model. It started with
pretraining BERT models on different types of big data such as Wikipedia and
MIMIC. Afterwards, these pretrained BERT models were fine-tuned on CMED
training data. These fine-tuned BERT models were employed to accomplish
medication event classification on CMED testing data with multiple predictions.
These multiple predictions generated by these fine-tuned BERT models were
integrated to build final prediction with voting strategies. Experimental
results demonstrated that BERT-based ensemble models can effectively improve
strict Micro-F score by about 5% and strict Macro-F score by about 6%,
respectively.