Zero-Shot Evaluation of Kimi K2 on Pediatric Clinical Cases

Journal: medRxiv

Published Date: Jan 1, 2025

Abstract

The application of large language models (LLMs) in pediatric medicine requires rigorous performance evaluation prior to clinical implementation. To evaluate the accuracy of the Kimi K2 model in analyzing pediatric clinical cases using a zero-shot approach. Methods: 2,249 multiple-choice questions from pediatric clinical cases, ranging in age from 1 day to 16 years, extracted from the MedQA dataset were analyzed. The model was tested via API with standardized parameters, temperature set to zero, and zero-shot prompts. Accuracy was calculated by comparing the responses with the dataset’s ground truth. Kimi K2 achieved an overall accuracy of 78.39%, corresponding to 1,763 correct answers out of 2,249 total, with 100% of responses in the required format. Conclusions: The model demonstrates competitive performance for medical education and diagnostic support, while still having limitations that require human clinical supervision.

Authors

Gianluca Mondillo; Mariapia Masino; Simone Colosimo; Alessandra Perrotta; Vittoria Frattolillo; Fabio Giovanni Abbate

External Resources

View on medRxiv Access via DOI

Zero-Shot Evaluation of Kimi K2 on Pediatric Clinical Cases

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

Zero-Shot Evaluation of Kimi K2 on Pediatric Clinical Cases

Abstract

Authors

Categories

External Resources

Stay Ahead of Medical AI

Popular Topics

Recent Journals