The answer may vary: large language model response patterns challenge their use in test item analysis.

Journal: Medical teacher

Published Date: May 4, 2025

Abstract

INTRODUCTION: The validation of multiple-choice question (MCQ)-based assessments typically requires administration to a test population, which is resource-intensive and practically demanding. Large language models (LLMs) are a promising tool to aid in many aspects of assessment development, including the challenge of determining the psychometric properties of test items. This study investigated whether LLMs could predict the difficulty and point biserial indices of MCQs, potentially alleviating the need for preliminary analysis in a test population.

Authors

Lauren K Buhl

Department of Anesthesiology, Dartmouth Hitchcock Medical Center, Lebanon, NH, USA.

Keywords

No keywords available for this article.

External Resources

View on PubMed Access via DOI PubMed (40319392)

The answer may vary: large language model response patterns challenge their use in test item analysis.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals