Boosting Social Determinants of Health Extraction with Semantic Knowledge Augmented Large Language Model.
Journal:
AMIA ... Annual Symposium proceedings. AMIA Symposium
Published Date:
May 22, 2025
Abstract
Social determinants of health (SDoH) significantly impacts health outcomes and contributes to perpetuating health disparities across healthcare applications. However, automatic extraction of SDoH information from Electronic Health Records (EHRs) is challenging due to the unstructured nature of clinical narratives that contain SDoH related information. Recent advances in Large Language Models (LLMs) have shown great promise for automated SDoH extraction. However, their performance suffers for the imbalanced SDoH categories due to the data scarcity issues. To address this, we propose an innovative approach that augments LLMs with semantic knowledge obtained from the Unified Medical Language Systems (UMLS). This strategy enriches the feature representations of imbalanced SDoH classes, leading to accurate SDoH extraction. More specifically, our proposed data augmentation strategy generates semantically enriched clinical narratives at the LLM pre-finetuning stage. This approach enables the LLM to better adapt to the target data and leads to a good initialization for the finetuning stage. Through extensive experiments using publicly available MIMIC-SDoH data, the proposed approach demonstrates significant improvement in results for the SDoH extraction, especially for the imbalanced classes.