ScaleMAI: Accelerating the Development of Trusted Datasets and AI Models
Journal:
arXiv
Published Date:
Jan 6, 2025
Abstract
Building trusted datasets is critical for transparent and responsible Medical
AI (MAI) research, but creating even small, high-quality datasets can take
years of effort from multidisciplinary teams. This process often delays AI
benefits, as human-centric data creation and AI-centric model development are
treated as separate, sequential steps. To overcome this, we propose ScaleMAI,
an agent of AI-integrated data curation and annotation, allowing data quality
and AI performance to improve in a self-reinforcing cycle and reducing
development time from years to months. We adopt pancreatic tumor detection as
an example. First, ScaleMAI progressively creates a dataset of 25,362 CT scans,
including per-voxel annotations for benign/malignant tumors and 24 anatomical
structures. Second, through progressive human-in-the-loop iterations, ScaleMAI
provides Flagship AI Model that can approach the proficiency of expert
annotators (30-year experience) in detecting pancreatic tumors. Flagship Model
significantly outperforms models developed from smaller, fixed-quality
datasets, with substantial gains in tumor detection (+14%), segmentation (+5%),
and classification (72%) on three prestigious benchmarks. In summary, ScaleMAI
transforms the speed, scale, and reliability of medical dataset creation,
paving the way for a variety of impactful, data-driven applications.