The Evaluation of Generative AI Should Include Repetition to Assess Stability.

Journal: JMIR mHealth and uHealth
Published Date:

Abstract

The increasing interest in the potential applications of generative artificial intelligence (AI) models like ChatGPT in health care has prompted numerous studies to explore its performance in various medical contexts. However, evaluating ChatGPT poses unique challenges due to the inherent randomness in its responses. Unlike traditional AI models, ChatGPT generates different responses for the same input, making it imperative to assess its stability through repetition. This commentary highlights the importance of including repetition in the evaluation of ChatGPT to ensure the reliability of conclusions drawn from its performance. Similar to biological experiments, which often require multiple repetitions for validity, we argue that assessing generative AI models like ChatGPT demands a similar approach. Failure to acknowledge the impact of repetition can lead to biased conclusions and undermine the credibility of research findings. We urge researchers to incorporate appropriate repetition in their studies from the outset and transparently report their methods to enhance the robustness and reproducibility of findings in this rapidly evolving field.

Authors

  • Lingxuan Zhu
    Department of Urology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200127, China; Department of Etiology and Carcinogenesis, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College, Beijing, China; Changping Laboratory, Beijing, China.
  • Weiming Mou
    Department of Urology, Shanghai General Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai, China.
  • Chenglin Hong
    Department of Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou, China.
  • Tao Yang
    The First Clinical Medical College, The Affiliated People's Hospital of Fujian University of Traditional Chinese Medicine, Fuzhou, Fujian, China.
  • Yancheng Lai
    Department of Urology, Renji Hospital, Shanghai Jiao Tong University School of Medicine, Shanghai 200127, China; The First School of Clinical Medicine, Southern Medical University, Guangzhou, China.
  • Chang Qi
    Institute of Logic and Computation, TU Wien, Austria.
  • Anqi Lin
    Department of Oncology, Zhujiang Hospital, Southern Medical University, Guangzhou, China.
  • Jian Zhang
    College of Pharmacy, Ningxia Medical University, Yinchuan, NingxiaHui Autonomous Region, China.
  • Peng Luo
    Institute of Forest Resource Information Techniques, Chinese Academy of Forestry, Beijing 100091, PR China.