Comparisons between a Large Language Model-based Real-Time Compound Diagnostic Medical AI Interface and Physicians for Common Internal Medicine Cases using Simulated Patients
Journal:
arXiv
Published Date:
May 27, 2025
Abstract
Objective To develop an LLM based realtime compound diagnostic medical AI
interface and performed a clinical trial comparing this interface and
physicians for common internal medicine cases based on the United States
Medical License Exam (USMLE) Step 2 Clinical Skill (CS) style exams. Methods A
nonrandomized clinical trial was conducted on August 20, 2024. We recruited one
general physician, two internal medicine residents (2nd and 3rd year), and five
simulated patients. The clinical vignettes were adapted from the USMLE Step 2
CS style exams. We developed 10 representative internal medicine cases based on
actual patients and included information available on initial diagnostic
evaluation. Primary outcome was the accuracy of the first differential
diagnosis. Repeatability was evaluated based on the proportion of agreement.
Results The accuracy of the physicians' first differential diagnosis ranged
from 50% to 70%, whereas the realtime compound diagnostic medical AI interface
achieved an accuracy of 80%. The proportion of agreement for the first
differential diagnosis was 0.7. The accuracy of the first and second
differential diagnoses ranged from 70% to 90% for physicians, whereas the AI
interface achieved an accuracy rate of 100%. The average time for the AI
interface (557 sec) was 44.6% shorter than that of the physicians (1006 sec).
The AI interface ($0.08) also reduced costs by 98.1% compared to the
physicians' average ($4.2). Patient satisfaction scores ranged from 4.2 to 4.3
for care by physicians and were 3.9 for the AI interface Conclusion An LLM
based realtime compound diagnostic medical AI interface demonstrated diagnostic
accuracy and patient satisfaction comparable to those of a physician, while
requiring less time and lower costs. These findings suggest that AI interfaces
may have the potential to assist primary care consultations for common internal
medicine cases.