AI-Assisted Medical Documentation in a Multilingual Swiss Health Care System: Proof-of-Concept Study.
Journal:
JMIR AI
Published Date:
Jun 5, 2026
Abstract
BACKGROUND: Medical documentation imposes a significant administrative burden on physicians and reduces time for direct patient care. Artificial intelligence (AI)-assisted tools such as automatic speech recognition and large language models (LLMs) promise to reduce this burden, but their performance in multilingual environments has not been explored. Switzerland is highly multilingual, and non-native German-speaking physicians may find documentation particularly challenging. OBJECTIVE: This study aimed to compare the efficiency and documentation quality of four clinical documentation workflows-including both AI-assisted and traditional methods-in a Swiss tertiary hospital setting characterized by linguistic diversity. METHODS: In this proof-of-concept study at a Swiss tertiary hospital (Department of Plastic and Hand Surgery, Cantonal Hospital Aarau), two physicians-a native Swiss German speaker and a non-native German speaker-documented encounters with simulated patients having common hand disorders. Four documentation workflows were tested: (1) traditional dictation with transcription by a secretary; (2) real-time dictation using speech recognition software for voice to text transcription; (3) postencounter dictation transcribed by an AI (Whisper) and processed by a GPT-based agent; and (4) AI-assisted ambient dictation of entire appointments using audio recording and automatic transcription. Documentation efficiency was measured by recorded physician time, and note quality was assessed using a modified Physician Documentation Quality Instrument (PDQI-9) scored by three different LLMs. To protect patient privacy, only synthetic (simulated) patient data were used. RESULTS: AI-assisted workflows-particularly workflow 4 (AI-assisted ambient dictation)-produced the shortest physician documentation times per report. In post-hoc comparisons, workflow 4 was significantly faster than solely the speech recognition software workflow (workflow 2) for both physicians (adjusted P<.001). For the non-native speaker, workflow 4 was not significantly faster than traditional dictation (workflow 1) after adjustment (P=.08). LLM evaluators assigned high absolute scores (median PDQI-9 >47/50); however, inter-rater reliability was poor (Krippendorff's alpha=-.433, 95% CI: -0.444 to -0.416), indicating systematic disagreement that precludes definitive conclusions about documentation quality from these scores alone. CONCLUSIONS: AI-assisted documentation demonstrated significant time savings for the native speaker, though the reduction for the non-native speaker did not reach statistical significance in this pilot (P=.08). Such tools show potential to alleviate the linguistic challenges faced by non-native speakers, reduce administrative burdens, and enable physicians to spend more time with patients. However, the inconsistency of AI-based quality scoring suggests that LLMs cannot yet reliably replace human evaluation. Future studies should evaluate these workflows in real-world clinical implementation, address data privacy and security issues, and include human evaluators to validate the benefits observed in this study.
Authors
Keywords
No keywords available for this article.