Breaking the bottleneck: self-supervised deep learning framework for fully automated fossil CT segmentation

Journal: bioRxiv
Published Date:

Abstract

Semantic segmentation of domain-specific imaging data where labelled training examples are scarce and foreground-background contrast is low remains an open challenge in deep learning applied to science. Palaeontological computed tomography (CT) exemplifies this problem: digitally isolating fossilised bone from surrounding rock matrix is labour-intensive ([≥]100 hrs/dataset), subjective, and often reliant on expensive proprietary software, creating a segmentation bottleneck that prevents large-scale and rapid processing of CT data collections. Here we present a self-supervised, end-to-end framework combining SimCLR v1 contrastive pretraining with deterministic pseudo-label generation and U-Net refinement to fully automate fossil CT segmentation without manual annotation. Using 50,626 CT images from the Middle Jurassic Kilmaluag Formation spanning amphibians, reptiles, dinosaurs, and early mammals, the framework achieved a Dice coefficient of 93.66% and IoU of 82.42% on a held-out specimen not seen during training, comparable to the highest Dice and IoU values reported in recent Deep Learning-based fossil CT segmentation studies. Cross-taxon generalisation was validated geometrically on six fully external specimens, achieving sub-voxel mesh agreement with manually thresholded references. By eliminating the annotation requirement that has limited prior deep learning approaches in palaeontology, this framework reduces per-specimen processing from ~100 person-hours to 6 hrs (one-time UNet training) +1-3 minutes (mesh generation per specimen), an essential first step towards batch processing and analysis of CT data for large-scale comparative and quantitative analyses.

Authors

  • Roy
  • A.; Ghosh
  • P.; Weston
  • F.; Hartley
  • B.; Salili-James
  • A.; Poon
  • S. T. S.; Maidment
  • S. C. R.; Butler
  • R. J.

Categories