NucGen3D: a Synthetic Framework for Large-Scale 3D Nuclear Segmentation with Open-Source Training Data AND Models
Journal:
bioRxiv
Published Date:
Jan 1, 2025
Abstract
Robust nuclear segmentation in 3D microscopy images is a critical yet unresolved challenge in quantitative cell biology, hindered by the scarcity and variability of annotated volumetric datasets. Because such data are difficult to obtain, most state-of-the-art approaches, including Cellpose, segment individual 2D slices and then heuristically reconstruct 3D volumes, thereby losing critical spatial context. Our analysis of expert annotator performance confirms that ignoring 3D context introduces substantial variability in nuclear detection and annotation. While a few 3D models have been trained on small or toy datasets, no large-scale, openly available resource currently exists to enable robust training of high-capacity 3D segmentation networks. To address this, we present NucGen3D, a customizable simulation framework that generates large-scale, annotated 3D microscopy datasets from limited 2D input, specifically the 2018 Data Science Bowl dataset. NucGen3D produces realistic 3D volumes across diverse biological and imaging scenarios, including variations in nuclear morphology, spatial arrangement, acquisition artifacts, and imaging noise. Using this synthetic data, we trained two models from scratch: a 2D convolutional neural network under Cellpose-like conditions, and a fully 3D convolutional model that extends the 2D settings. We evaluated both on a challenging, independent real-world dataset with complex nuclear architectures. Both models, especially the 3D model, consistently outperformed state-of-the-art methods, including those trained on larger annotated datasets or based on more complex architectures. These results demonstrate that synthetic data can effectively substitute for real 3D annotations in training performing models at scale. To promote reproducibility and further research, we release both the NucGen3D framework and the fully trained 3D segmentation model as open source, making this the first end-to-end open resource for large-scale 3D nuclear segmentation.