A-Eval: A benchmark for cross-dataset and cross-modality evaluation of abdominal multi-organ segmentation.

Journal: Medical image analysis

PMID: 39970528

Abstract

Although deep learning has revolutionized abdominal multi-organ segmentation, its models often struggle with generalization due to training on small-scale, specific datasets and modalities. The recent emergence of large-scale datasets may mitigate this issue, but some important questions remain unsolved: Can models trained on these large datasets generalize well across different datasets and imaging modalities? If yes/no, how can we further improve their generalizability? To address these questions, we introduce A-Eval, a benchmark for the cross-dataset and cross-modality Evaluation ('Eval') of Abdominal ('A') multi-organ segmentation, integrating seven datasets across CT and MRI modalities. Our evaluations indicate that significant domain gaps persist despite larger data scales. While increased datasets improve generalization, model performance on unseen data remains inconsistent. Joint training across multiple datasets and modalities enhances generalization, though annotation inconsistencies pose challenges. These findings highlight the need for diverse and well-curated training data across various clinical scenarios and modalities to develop robust medical imaging models. The code and pre-trained models are available at https://github.com/uni-medical/A-Eval.

Authors

Ziyan Huang

Institute of Medical Robotics, Shanghai Jiao Tong University, Shanghai, China; Shanghai AI Laboratory, Shanghai, China.
Zhongying Deng
Jin Ye

Division of Gastroenterology, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China.
Haoyu Wang

North Carolina State University, Department of Statistics, Raleigh, North Carolina, USA.
Yanzhou Su

Shanghai Artificial Intelligence Laboratory, Shanghai, 200000, China.
Tianbin Li

State Key Laboratory of Geohazard Prevention and Geoenvironment Protection, Chengdu University of Technology, Chengdu 610059, China.
Hui Sun

Department of Thyroid Surgery, China-Japan Union Hospital of Jilin University, Jilin University, Changchun, China.
Junlong Cheng

College of Information Science and Engineering, Xinjiang University, Urumqi 830000, China; Key Laboratory of software engineering technology, Xinjiang University, China.
Jianpin Chen

School of Biomedical Engineering, Shanghai Jiao Tong University, Shanghai, 200240, China; Shanghai Artificial Intelligence Laboratory, Shanghai, 200000, China.
Junjun He

ShenZhen Key Lab of Computer Vision and Pattern Recognition, Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences, Shenzhen, People's Republic of China; Shanghai AI Laboratory, Shanghai, People's Republic of China; Shanghai Jiao Tong University, Shanghai, People's Republic of China.
Yun Gu

Institute of Image Processing and Pattern Recognition, Shanghai Jiao Tong University, SEIEE Building 2-427, No. 800, Dongchuan Road, Minhang District, Shanghai, 200240 China.
Shaoting Zhang
Lixu Gu
Yu Qiao

Department of English and American Studies, RWTH Aachen University, Aachen, North Rhine-Westphalia, Germany.

Keywords

Abdomen Benchmarking Deep Learning Humans Magnetic Resonance Imaging Tomography, X-Ray Computed

External Resources

View on PubMed Access via DOI PubMed (39970528)

A-Eval: A benchmark for cross-dataset and cross-modality evaluation of abdominal multi-organ segmentation.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals