MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale

Journal: arXiv

Published Date: Jun 4, 2025

Abstract

We introduce MedAgentGYM, the first publicly available training environment designed to enhance coding-based medical reasoning capabilities in large language model (LLM) agents. MedAgentGYM comprises 72,413 task instances across 129 categories derived from authentic real-world biomedical scenarios. Tasks are encapsulated within executable coding environments, each featuring detailed task descriptions, interactive feedback mechanisms, verifiable ground-truth annotations, and scalable training trajectory generation. Extensive benchmarking of over 30 LLMs reveals a notable performance disparity between commercial API-based models and open-source counterparts. Leveraging MedAgentGYM, Med-Copilot-7B achieves substantial performance gains through supervised fine-tuning (+36.44%) and continued reinforcement learning (+42.47%), emerging as an affordable and privacy-preserving alternative competitive with gpt-4o. By offering both a comprehensive benchmark and accessible, expandable training resources within unified execution environments, MedAgentGYM delivers an integrated platform to develop LLM-based coding assistants for advanced biomedical research and practice.

Authors

Ran Xu
Yuchen Zhuang
Yishan Zhong
Yue Yu
Xiangru Tang
Hang Wu
May D. Wang
Peifeng Ruan
Donghan Yang
Tao Wang
Guanghua Xiao
Carl Yang
Yang Xie
Wenqi Shi

External Resources

View on arXiv arXiv (http://arxiv.org/abs/2506.04405v1)

MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale

Abstract

Authors

Categories

External Resources

Popular Topics

Recent Journals

MedAgentGym: Training LLM Agents for Code-Based Medical Reasoning at Scale

Abstract

Authors

Categories

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals