The answer may vary: large language model response patterns challenge their use in test item analysis.

Journal: Medical teacher
Published Date:

Abstract

INTRODUCTION: The validation of multiple-choice question (MCQ)-based assessments typically requires administration to a test population, which is resource-intensive and practically demanding. Large language models (LLMs) are a promising tool to aid in many aspects of assessment development, including the challenge of determining the psychometric properties of test items. This study investigated whether LLMs could predict the difficulty and point biserial indices of MCQs, potentially alleviating the need for preliminary analysis in a test population.

Authors

  • Lauren K Buhl
    Department of Anesthesiology, Dartmouth Hitchcock Medical Center, Lebanon, NH, USA.

Keywords

No keywords available for this article.