End-to-end prediction of clinical outcomes in head and neck squamous cell carcinoma with foundation model-based multiple instance learning

Journal: medRxiv
Published Date:

Abstract

Foundation models (FMs) show promise in medical AI by learning flexible features from large datasets, potentially surpassing handcrafted radiomics. Outcome prediction of head and neck squamous cell carcinomas (HNSCC) with FMs using routine imaging remains unexplored. To evaluate end-to-end FM-based multiple instance learning (MIL) for 2-year overall survival (OS), locoregional control (LRC), and freedom from distant metastasis (FFDM) prediction and risk group stratification using pretreatment CT scans in HNSCC. We analyzed data of 2485 patients from three retrospective HNSCC cohorts (RADCURE, HN1, HN-PET-CT), treated between 2004 and 2017 with available pre-treatment CTs and primary gross tumor volume (GTVp) segmentations. The RADCURE cohort was split into training (n=1464) and test (N=606), with HN1 (n=131) and HN-PET-CT (n=284) as additional test cohorts. FM-based MIL models (2D, multiview and 3D) for 2-year endpoint prediction and risk stratification wre evaluated based on area under the receiver operator curve (AUROC) and Kaplan-Meier (KM) with hazard ratios (HR), compared with radiomics and assessed for multimodal enhancement with clinical baselines. 2D MIL models achieved 2-year test AUROCs of 0.75-0.84 (OS), 0.66-0.75 (LRC) and 0.71-0.78 (FFDM), outperforming multiview and 3D MIL (AUROCs: 0.50-0.77, p≥0.15) and comparable or superior to radiomics (AUROCs: 0.64-0.74, p≥0.012). Significant stratification was observed (HRs: 2.14-4.77, p≤0.039). Multimodal enhancement of 2-year OS/FFDM (AUROCs: 0.82-0.87, p≤0.018) was observed for patients without human papilloma virus positive (HPV+) tumors. FM-based MIL demonstrates promise in HNSCC risk prediction, showing similar or superior performance to radiomics and enhancing clinical baselines in non-HPV+ patients. First end-to-end study using both foundation models and multiple instance learning for outcome prediction in head and neck squamous cell carcinoma. Multiple instance learning approaches predict clinically-relevant 2-year endpoints and stratify patients across external cohorts with similar or better performance than handcrafted radiomics. Multimodal inclusion of clinical and multiple instance learning information improve clinical baseline models in patients without human papillomavirus positive tumors.

Authors

  • Asier Rabasco Meneghetti; Marta Ligero Hernández; Jens-Peter Kuehn; Steffen Löck; Zunamys Itzel Carrero; Raquel Perez-Lopez; Keno Bressem; Titus K. Brinker; Alexander T. Pearson; Daniel Truhn; Sven Nebelung; Jakob Nikolas Kather

Categories