Large language models for self-administered conversational vignette assessment of provider competencies: A pilot and validation study in Vietnam with automated LLM-powered transcript classification
Journal:
medRxiv
Published Date:
Mar 4, 2026
Abstract
We developed and validated a self-administered clinical vignette platform powered by a large language model (LLM), deployed through a SurveyCTO web survey, to measure primary health care provider competencies in Vietnam. In a pilot focus group, nine physicians rated LLM-simulated patient interactions as realistic (mean 3.78/5) and user-friendly. In the validation phase, 22 providers completed 132 vignette interactions across ten clinical scenarios in Vietnamese. Essential diagnostic checklist scores (human-coded from translated transcripts) correlated with expert clinician evaluations (Pearson's r = 0.55-0.60). LLM-automated coding of checklist items from translated English transcripts correlated reasonably with human coding (r = 0.53), and coding directly from Vietnamese transcripts performed comparably (r = 0.51), suggesting that a separate translation step may not be necessary. The total cost of 132 chatbot interactions was under USD 2. LLM-driven conversational vignettes represent a low-cost and scalable method for assessing provider competencies in respondents' local language, eliminating the need for extensive enumeration staffs while preserving the open-ended format critical to vignette validity, and additionally introducing flexible feature extraction from transcripts using grading rubrics. The platform is open-source and designed for replication in other health system contexts.