The answer may vary: large language model response patterns challenge their use in test item analysis.
Journal:
Medical teacher
Published Date:
May 4, 2025
Abstract
INTRODUCTION: The validation of multiple-choice question (MCQ)-based assessments typically requires administration to a test population, which is resource-intensive and practically demanding. Large language models (LLMs) are a promising tool to aid in many aspects of assessment development, including the challenge of determining the psychometric properties of test items. This study investigated whether LLMs could predict the difficulty and point biserial indices of MCQs, potentially alleviating the need for preliminary analysis in a test population.
Authors
Keywords
No keywords available for this article.