ViT-Stain: Vision transformer-driven virtual staining for skin histopathology via global contextual learning.

Journal: PloS one
Published Date:

Abstract

Current virtual staining approaches for histopathology slides use convolutional neural networks (CNNs) and generative adversarial networks (GANs). These approaches rely on local receptive fields, struggle to capture global context, and long-range tissue dependencies. This limitation can introduce artifacts in fine textures and cause loss of subtle morphological details. We propose a novel vision transformer-driven virtual staining framework (ViT-Stain) that translates unstained skin tissue images into hematoxylin and eosin (H&E)-equivalent images. The transformer's self-attention enables ViT-Stain to capture long-range dependencies, preserve global context, and maintain fine textures. We trained ViT-Stain on the E-Staining DermaRepo dataset, which pairs unstained and H&E-stained whole-slide images (WSIs). We validated our model using metrics including SSIM, PSNR, FID, KID, LPIPS, and a novel histology-specific fidelity index (HSFI). Three board-certified pathologists provided feedback for qualitative evaluations. ViT-Stain outperforms leading CNN and GAN models, including Pix2Pix, CycleGAN, CUTGAN, and DCLGAN. It achieves an overall diagnostic concordance of 85% with virtual H&E-stains (Fleiss' κ = 0.88). However, the model requires longer training (about 93 hours on A100 GPUs) and inference times (about 2.9 minutes). Our work advances AI-driven diagnostic reproducibility for high-fidelity clinical settings and aligns with the World Health Organization (WHO) global health goals.

Authors

Keywords

No keywords available for this article.