Evaluating capabilities of large language models: Performance of GPT-4 on surgical knowledge assessments.
Journal:
Surgery
Published Date:
Jan 20, 2024
Abstract
BACKGROUND: Artificial intelligence has the potential to dramatically alter health care by enhancing how we diagnose and treat disease. One promising artificial intelligence model is ChatGPT, a general-purpose large language model trained by OpenAI. ChatGPT has shown human-level performance on several professional and academic benchmarks. We sought to evaluate its performance on surgical knowledge questions and assess the stability of this performance on repeat queries.