Multiple instance learning approach for automated gallbladder cancer detection using ultrasound imaging: multi-center validation of a deep learning model with the public dataset contribution.
Journal:
The Lancet regional health. Southeast Asia
Published Date:
Mar 6, 2026
Abstract
BACKGROUND: Gallbladder cancer (GBC) diagnosis is challenging due to overlapping imaging features. We developed and validated a multiple instance learning (MIL) model for automated GBC detection using a large-scale multi-center ultrasound dataset and benchmarked it against state-of-the-art architectures. METHODS: This was a retrospective and prospective multi-center cohort study. We trained a gated attention MIL (GAIA-MIL) model on the prospective AURORA-GB dataset (August 2022-July 2024) and two public datasets. The model was evaluated on a temporally independent internal test set (August 2024-December 2024) and three retrospective external cohorts. The area under curve (AUC), sensitivity, and specificity of GAIA-MIL was compared to Clustering-constrained Attention MIL (CLAM), Dual-Stream MIL (DS-MIL), and Transformer-based MIL (TransMIL). FINDINGS: The datasets comprised 11,012 images from 1151 patients. Cross-validation achieved a mean AUC of 0.874 (95% CI 0.846-0.902). On the internal test set (n = 97), GAIA-MIL achieved 87.7% sensitivity (78.9-95.1%), 86.2% specificity (72.4-96.9%), and an AUC of 0.883 (0.786-0.963). Pooled external validation (n = 122) showed an AUC of 0.778 (0.698-0.852). Performance varied by external center (AUCs: 0.722, 0.950, and 0.749). In comparative benchmarking, while TransMIL excelled internally (AUC 0.871), its performance degraded significantly in external validation (Pooled AUC 0.654). GAIA-MIL demonstrated superior stability, maintaining robust sensitivity (78.2%), specificity (73.4%), and AUC (0.778) pooled across all diverse external centers where complex transformers struggled. Interpretability analysis confirmed the model focused on clinically relevant features like wall thickening. INTERPRETATION: While complex architectures like TransMIL perform well internally, GAIA-MIL offers the optimal balance of performance and generalizability for multi-center deployment. The AURORA-GB benchmark dataset is publicly released to advance research. FUNDING: None.
Authors
Keywords
No keywords available for this article.