Structured LLM Augmentation for Clinical Information Extraction.

Journal: Studies in health technology and informatics
Published Date:

Abstract

Information extraction tasks, such as Named Entity Recognition (NER) and Relation Extraction (RE), are essential for advancing clinical research and applications. However, these tasks are hindered by the scarcity of labeled clinical documents due to privacy concerns and high annotation costs. This study introduces a novel framework combining Large Language Models (LLMs) for data augmentation with an adapted BERT model for clinical information extraction. The framework encodes entity and relational information within clinical note segments, enabling LLMs to generate diverse and contextually accurate augmentations while preserving structural integrity. Augmented data is used to train a segmentation-based BERT model, overcoming sequence length limitations and integrating global context via BiLSTM. Evaluations on public and proprietary datasets demonstrate significant performance improvements, highlighting the approach's potential to address data scarcity in clinical information extraction tasks.

Authors

  • Ying Wei
    School of Information Science and Engineering, Northeastern University, Shenyang 110004, China ; Key Laboratory of Medical Imaging Calculation of the Ministry of Education, Shenyang 110004, China.
  • Qi Li
    The First Affiliated Hospital of Yangtze University, Jingzhou, Hubei, China.
  • Jay Pillai
    Truveta.