Comparative Analysis of Machine Learning Models for Lung Cancer Mutation Detection and Staging Using 3D CT Scans
Journal:
arXiv
Published Date:
May 28, 2025
Abstract
Lung cancer is the leading cause of cancer mortality worldwide, and
non-invasive methods for detecting key mutations and staging are essential for
improving patient outcomes. Here, we compare the performance of two machine
learning models - FMCIB+XGBoost, a supervised model with domain-specific
pretraining, and Dinov2+ABMIL, a self-supervised model with attention-based
multiple-instance learning - on 3D lung nodule data from the Stanford
Radiogenomics and Lung-CT-PT-Dx cohorts. In the task of KRAS and EGFR mutation
detection, FMCIB+XGBoost consistently outperformed Dinov2+ABMIL, achieving
accuracies of 0.846 and 0.883 for KRAS and EGFR mutations, respectively. In
cancer staging, Dinov2+ABMIL demonstrated competitive generalization, achieving
an accuracy of 0.797 for T-stage prediction in the Lung-CT-PT-Dx cohort,
suggesting SSL's adaptability across diverse datasets. Our results emphasize
the clinical utility of supervised models in mutation detection and highlight
the potential of SSL to improve staging generalization, while identifying areas
for enhancement in mutation sensitivity.