Assessing the Accuracy and Reliability of Large Language Models in Psychiatry Using Standardized Multiple-Choice Questions: Cross-Sectional Study.

Journal: Journal of medical Internet research

Published Date: May 20, 2025

Abstract

BACKGROUND: Large language models (LLMs), such as OpenAI's GPT-3.5, GPT-4, and GPT-4o, have garnered early and significant enthusiasm for their potential applications within mental health, ranging from documentation support to chat-bot therapy. Understanding the accuracy and reliability of the psychiatric "knowledge" stored within the parameters of these models and developing measures of confidence in their responses (ie, the likelihood that an LLM response is accurate) are crucial for the safe and effective integration of these tools into mental health settings.

Authors

Kaitlin Hanss

Department of Psychiatry and Behavioral Sciences, University of California, San Francisco, San Francisco, CA, United States.
Karthik V Sarma

Department of Bioengineering, University of California, Los Angeles, CA, USA.
Anne L Glowinski

Department of Psychiatry and Behavioral Sciences, University of California, San Francisco, San Francisco, CA, United States.
Andrew Krystal

Department of Psychiatry and Behavioral Sciences, University of California, San Francisco, San Francisco, CA, United States.
Ramotse Saunders

Department of Psychiatry and Behavioral Sciences, University of California, San Francisco, San Francisco, CA, United States.
Andrew Halls

Department of Psychiatry and Behavioral Sciences, University of California, San Francisco, San Francisco, CA, United States.
Sasha Gorrell

Department of Psychiatry and Behavioral Sciences, University of California, San Francisco, San Francisco, CA, United States.
Erin Reilly

Department of Psychiatry and Behavioral Sciences, University of California, San Francisco, San Francisco, CA, United States.

Keywords

Cross-Sectional Studies Humans Large Language Models Psychiatry Reproducibility of Results Surveys and Questionnaires

External Resources

View on PubMed Access via DOI PubMed (40392576)

Assessing the Accuracy and Reliability of Large Language Models in Psychiatry Using Standardized Multiple-Choice Questions: Cross-Sectional Study.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

Assessing the Accuracy and Reliability of Large Language Models in Psychiatry Using Standardized Multiple-Choice Questions: Cross-Sectional Study.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals