A computationally frugal open-source foundation model for thoracic disease detection in lung cancer screening programs
Journal:
arXiv
Published Date:
Jul 2, 2025
Abstract
Low-dose computed tomography (LDCT) imaging employed in lung cancer screening
(LCS) programs is increasing in uptake worldwide. LCS programs herald a
generational opportunity to simultaneously detect cancer and non-cancer-related
early-stage lung disease. Yet these efforts are hampered by a shortage of
radiologists to interpret scans at scale. Here, we present TANGERINE, a
computationally frugal, open-source vision foundation model for volumetric LDCT
analysis. Designed for broad accessibility and rapid adaptation, TANGERINE can
be fine-tuned off the shelf for a wide range of disease-specific tasks with
limited computational resources and training data. Relative to models trained
from scratch, TANGERINE demonstrates fast convergence during fine-tuning,
thereby requiring significantly fewer GPU hours, and displays strong label
efficiency, achieving comparable or superior performance with a fraction of
fine-tuning data. Pretrained using self-supervised learning on over 98,000
thoracic LDCTs, including the UK's largest LCS initiative to date and 27 public
datasets, TANGERINE achieves state-of-the-art performance across 14 disease
classification tasks, including lung cancer and multiple respiratory diseases,
while generalising robustly across diverse clinical centres. By extending a
masked autoencoder framework to 3D imaging, TANGERINE offers a scalable
solution for LDCT analysis, departing from recent closed, resource-intensive
models by combining architectural simplicity, public availability, and modest
computational requirements. Its accessible, open-source lightweight design lays
the foundation for rapid integration into next-generation medical imaging tools
that could transform LCS initiatives, allowing them to pivot from a singular
focus on lung cancer detection to comprehensive respiratory disease management
in high-risk populations.