HVUNet: A hybrid vision transformer-based UNet for accurate detection and localization in histopathology images.

Journal: Computers in biology and medicine
Published Date:

Abstract

Precise identification of object of interest (OoI) in histopathology images plays a vital role in cancer diagnosis and prognosis. Despite advances in digital pathology, detecting specific cellular structures within these images remains a significant challenge due to the inherent complexity and variability in cell morphology. Cellular structures exhibit similar visual characteristics, such as colors, shapes, and textures, making them difficult to distinguish from one another. Certain OoIs are much smaller than surrounding cells, rendering manual detection both challenging and error-prone. This paper introduces a hybrid vision transformer-based UNet (HVUNet) model, a novel approach designed to effectively identify and localize OoIs in histopathology images. To improve the detection in histopathology images, the proposed model incorporates UNet with vision transformers (ViTs) within an advanced encoder-decoder architecture. We evaluate HVUNet using the GZMH dataset, which includes histopathology images annotated for mitosis detection and the Lymphocyte detection (LD) dataset for lymphocyte cell detection. Through comprehensive experiments, we demonstrate that HVUNet notably surpasses several state-of-the-art models, including CNN variants, ViT-based models, and hybrid CNN-ViT architectures. Experimental results show that HVUNet outperforms traditional models such as UNet and recent advancements like UNETR and AttentionUNet, with a precision of 0.94, a recall of 0.60, and a F1-score of 0.72 for the GZMH dataset. Furthermore, HVUNet attained an Intersection over Union (IoU) score of 0.76 and a mean Average Precision (mAP) of 0.81, emphasizing its effectiveness in detecting mitotic cells. The model also achieved a F1-score of 0.76, an IoU of 0.63, and a mAP of 0.75, for the lymphocyte detection dataset demonstrating its effectiveness in detecting lymphocyte cells. To evaluate generalizability, we tested HVUNet on the MIDOG 2021 and PanopTILs datasets, observing competitive performance that demonstrated its robustness and broad applicability across diverse histopathology image analysis tasks.

Authors

  • Anusree Kanadath
    Department of Computer Science, Birla Institute of Technology and Science Pilani, Dubai Campus, Dubai International Academic City, 345055, Dubai, United Arab Emirates.
  • Angel Arul Jothi J
    Department of Computer Science, Birla Institute of Technology and Science Pilani, Dubai Campus, Dubai International Academic City, 345055, Dubai, United Arab Emirates. Electronic address: angeljothi@dubai.bits-pilani.ac.in.
  • Siddhaling Urolagin
    Department of Experimental Medical Science, BMC B13, Lund University, SE-22 184 Lund, Sweden. siddhaling@dubai.bits-pilani.ac.in.