Integrating snapshot ensemble learning into masked autoencoders for efficient self-supervised pretraining in medical imaging.

Journal: Scientific reports
Published Date:

Abstract

Self-supervised learning (SSL) has gained significant attention in medical imaging for its ability to leverage large amounts of unlabeled data for effective model pretraining. Among SSL methods, the masked autoencoder (MAE) has proven robust in learning rich representations by reconstructing masked patches of input data. However, pretraining MAE models typically demands substantial computational resources, especially when multiple MAE models are independently trained for ensemble predictions. This study introduces the Snap-MAE model, which integrates snapshot ensemble learning into the MAE pretraining process to optimize computational efficiency and enhance performance. The Snap-MAE model employs a cyclic cosine scheduler to periodically adjust the learning rate, enabling the capture of diverse model representations within a single training cycle and systematic "snapshotting" of models at regular intervals. These snapshot models are then fine-tuned on labeled data and ensembled to generate final predictions. Extensive experiments on two medical imaging tasks, multi-labeled pediatric thoracic disease classification and cardiovascular disease diagnosis, demonstrated that Snap-MAE consistently outperforms the vanilla MAE, ViT-S, and ResNet-34 models across all performance metrics. Moreover, by producing multiple pretrained models from a single pretraining phase, Snap-MAE reduces the computational burden typically associated with ensemble learning. Its straightforward implementation and effectiveness make Snap-MAE a practical solution for improving SSL-based pretraining in medical imaging, where labeled data and computational resources are often limited.

Authors

  • Taeyoung Yoon
    Korea University, Republic of Korea.
  • Daesung Kang
    Department of Healthcare Information Technology, Inje University, Gimhae, Republic of Korea.