Cascaded 3D Diffusion Models for Whole-body 3D 18-F FDG PET/CT synthesis from Demographics
Journal:
arXiv
Published Date:
May 28, 2025
Abstract
We propose a cascaded 3D diffusion model framework to synthesize
high-fidelity 3D PET/CT volumes directly from demographic variables, addressing
the growing need for realistic digital twins in oncologic imaging, virtual
trials, and AI-driven data augmentation. Unlike deterministic phantoms, which
rely on predefined anatomical and metabolic templates, our method employs a
two-stage generative process. An initial score-based diffusion model
synthesizes low-resolution PET/CT volumes from demographic variables alone,
providing global anatomical structures and approximate metabolic activity. This
is followed by a super-resolution residual diffusion model that refines spatial
resolution. Our framework was trained on 18-F FDG PET/CT scans from the AutoPET
dataset and evaluated using organ-wise volume and standardized uptake value
(SUV) distributions, comparing synthetic and real data between demographic
subgroups. The organ-wise comparison demonstrated strong concordance between
synthetic and real images. In particular, most deviations in metabolic uptake
values remained within 3-5% of the ground truth in subgroup analysis. These
findings highlight the potential of cascaded 3D diffusion models to generate
anatomically and metabolically accurate PET/CT images, offering a robust
alternative to traditional phantoms and enabling scalable, population-informed
synthetic imaging for clinical and research applications.