An LLM-Based Comparison of Ambient AI Scribes for Clinical Documentation
Journal:
medRxiv
Published Date:
Jan 1, 2025
Abstract
Ambient AI scribes have become an increasingly promising option for automating clinical documentation, with dozens of enterprise solutions available. It remains uncertain whether models with domain-specific tuning outperform naïve models “out of the box.” This study evaluated five commercial AI scribes, alongside a custom solution using the base model of GPT-o1 without fine-tuning, as well as an experienced human scribe, in a series of simulated clinical encounters. Generated notes from these parties were scored by large language models (LLMs) using a rubric assessing completeness, organization, accuracy, complexity handling, conciseness, and adaptability. Our naive solution achieved scores comparable with industry-leading solutions across all rubric dimensions. These findings suggest that the added value of domain-specific training in ambient AI medical scribes may be limited when compared to base foundation models.