Structured LLM Augmentation for Clinical Information Extraction.

Journal: Studies in health technology and informatics

Published Date: Aug 7, 2025

Abstract

Information extraction tasks, such as Named Entity Recognition (NER) and Relation Extraction (RE), are essential for advancing clinical research and applications. However, these tasks are hindered by the scarcity of labeled clinical documents due to privacy concerns and high annotation costs. This study introduces a novel framework combining Large Language Models (LLMs) for data augmentation with an adapted BERT model for clinical information extraction. The framework encodes entity and relational information within clinical note segments, enabling LLMs to generate diverse and contextually accurate augmentations while preserving structural integrity. Augmented data is used to train a segmentation-based BERT model, overcoming sequence length limitations and integrating global context via BiLSTM. Evaluations on public and proprietary datasets demonstrate significant performance improvements, highlighting the approach's potential to address data scarcity in clinical information extraction tasks.

Authors

Ying Wei

School of Information Science and Engineering, Northeastern University, Shenyang 110004, China ; Key Laboratory of Medical Imaging Calculation of the Ministry of Education, Shenyang 110004, China.
Qi Li

The First Affiliated Hospital of Yangtze University, Jingzhou, Hubei, China.
Jay Pillai

Truveta.

Keywords

Data Mining Electronic Health Records Humans Information Storage and Retrieval Natural Language Processing Programming Languages

External Resources

View on PubMed Access via DOI PubMed (40776002)

Structured LLM Augmentation for Clinical Information Extraction.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Structured LLM Augmentation for Clinical Information Extraction.

Abstract

Authors

Keywords

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals