Empowering Healthcare Practitioners with Language Models: Structuring Speech Transcripts in Two Real-World Clinical Applications
Journal:
arXiv
Published Date:
Jul 7, 2025
Abstract
Large language models (LLMs) such as GPT-4o and o1 have demonstrated strong
performance on clinical natural language processing (NLP) tasks across multiple
medical benchmarks. Nonetheless, two high-impact NLP tasks - structured tabular
reporting from nurse dictations and medical order extraction from
doctor-patient consultations - remain underexplored due to data scarcity and
sensitivity, despite active industry efforts. Practical solutions to these
real-world clinical tasks can significantly reduce the documentation burden on
healthcare providers, allowing greater focus on patient care. In this paper, we
investigate these two challenging tasks using private and open-source clinical
datasets, evaluating the performance of both open- and closed-weight LLMs, and
analyzing their respective strengths and limitations. Furthermore, we propose
an agentic pipeline for generating realistic, non-sensitive nurse dictations,
enabling structured extraction of clinical observations. To support further
research in both areas, we release SYNUR and SIMORD, the first open-source
datasets for nurse observation extraction and medical order extraction.