Understanding Clinician Edits to Ambient AI Draft Notes: A Feasibility Analysis Using Large Language Models
Journal:
medRxiv
Published Date:
Mar 2, 2026
Abstract
Ambient AI documentation tools generate draft notes that clinicians can review and edit before signing off in electronic health records. Scalable computational approaches to characterize how clinicians modify drafts remain limited, yet are essential for evaluating and improving AI effectiveness. We examined the feasibility of a few-shot prompted large language model (LLM) for categorizing sentence-level edits between AI drafts and final documentation. We developed five label-specific binary models targeting medication, symptom, diagnosis, orders/tests/procedures, and social history edits, and refined prompts using adversarial negatives and verification gates. Evaluation was performed against a human-annotated corpus. Medication and symptom models achieved promising performance (F1=0.787 and 0.780), whereas remaining models were precision-limited. Errors clustered in long, complex edits and category-boundary ambiguity. Therefore, prompt engineering is reliable for categorizing edits with explicit clues, while for complex context-dependent categories they are better suited for triage by labeling edits for human review.