Unveiling social determinants of health impact on adverse pregnancy outcomes through natural language processing.
Journal:
Scientific reports
Published Date:
Aug 9, 2025
Abstract
Understanding the role of Social Determinants of Health (SDoH) in pregnancy outcomes is critical for improving maternal and infant health yet extracting SDoH from unstructured electronic health records remains challenging. We trained and evaluated natural language processing (NLP) models for SDoH extraction from clinical notes in the MIMIC-III database (86 notes), and externally evaluated them on the MIMIC-IV database (171 notes) to assess generalizability. Focusing on social support, occupation, and substance use, we compared rule-based, word embedding, and contextual language models. The ClinicalBERT model with decision tree classifier achieved the highest performance for social support (F1: 0.92), while keyword processing excelled for occupation (F1: 0.74), and word embeddings with random forest performed best for substance use (F1: 0.83). Logistic regression revealed significant associations between pregnancy complications and both substance use (OR 6.47, pā<ā0.001) and social support (OR 0.07, pā<ā0.001). Our study demonstrates the feasibility of NLP for SDoH extraction and underscores their clinical relevance in maternal health.