Performance of ChatGPT on optometry and vision science exam questions.

Journal: Ophthalmic & physiological optics : the journal of the British College of Ophthalmic Opticians (Optometrists)
Published Date:

Abstract

The rapid proliferation of Large Language Models (LLM) tools, such as ChatGPT developed by OpenAI, presents both a challenge and an opportunity for educators. While LLMs can generate convincing written responses across a wide range of academic fields, their capabilities vary noticeably across different models, fields and even sub-fields. This paper aims to evaluate the capabilities of LLMs in the field of optometry and vision science by analysing the quality of the responses generated by ChatGPT using sample long answer questions covering different sub-fields of optometry, namely binocular vision, clinical communication, dispensing and ocular pathology. It also seeks to explore the possibility of LLMs being used as virtual graders. The capabilities of ChatGPT were explored utilising various GPT models (GPT-3.5, GPT-4 and o1 models, from oldest to newest) by investigating the concordance between ChatGPT and a human grader. This was followed by benchmarking the performance of these GPT models to various sample questions in optometry and vision science. Statistical analyses include mixed-effect analysis and the Friedman test, Wilcoxon signed-rank test and thematic analysis. ChatGPT graders awarded higher marks compared to human graders, but significant only for GPT-3.5 (p < 0.05). Benchmarking on sample questions demonstrated that all GPT models can generate satisfactory responses above the 50% 'pass' score in many cases (p < 0.05), albeit with the performance varying significantly across different sub-fields (p < 0.0001) and models (p = 0.0003). Newer models significantly outperformed older models in most cases. The frequency of thematic response errors was more mixed between GPT-3.5 and GPT-4 models (p < 0.05 to p > 0.99), while o1 made no thematic errors. These findings indicate ChatGPT may impact learning and teaching practices in this field. The inconsistent performances across sub-fields and additional implementation considerations, such as ethics and transparency, support a judicious adaptation of assessment practice and adoption of the technology in optometry and vision science education.

Authors

  • Nayuta Yoshioka
    School of Optometry and Vision Science, UNSW Australia, Sydney, New South Wales, Australia.
  • Vanessa Honson
    School of Optometry and Vision Science, UNSW Australia, Sydney, New South Wales, Australia.
  • Revathy Mani
    School of Optometry and Vision Science, UNSW Australia, Sydney, New South Wales, Australia.
  • Sharon Oberstein
    School of Optometry and Vision Science, UNSW Australia, Sydney, New South Wales, Australia.
  • Kathleen Watt
    School of Optometry and Vision Science, UNSW Australia, Sydney, New South Wales, Australia.
  • Vinod Maseedupally
    School of Optometry and Vision Science, University of New South Wales, Sydney, Australia.

Keywords

No keywords available for this article.