Explainable label guided lightweight network with axial transformer encoder for early detection of oral cancer.
Journal:
Scientific reports
PMID:
39984521
Abstract
Oral cavity cancer exhibits high morbidity and mortality rates. Therefore, it is essential to diagnose the disease at an early stage. Machine learning and convolution neural networks (CNN) are powerful tools for diagnosing mouth and oral cancer. In this study, we design a lightweight explainable network (LWENet) with label-guided attention (LGA) to provide a second opinion to the expert. The LWENet contains depth-wise separable convolution layers to reduce the computation costs. Moreover, the LGA module provides label consistency to the neighbor pixel and improves the spatial features. Furthermore, AMSA (axial multi-head self-attention) based ViT encoder incorporated in the model to provide global attention. Our ViT (vision transformer) encoder is computationally efficient compared to the classical ViT encoder. We tested LWRNet performance on the MOD (mouth and oral disease) and OCI (oral cancer image) datasets, and results are compared with the other CNN and ViT (vision transformer) based methods. The LWENet achieved a precision and F1-scores of 96.97% and 98.90% on the MOD dataset, and 99.48% and 98.23% on the OCI dataset, respectively. By incorporating Grad-CAM, we visualize the decision-making process, enhancing model interpretability. This work demonstrates the potential of LWENet with LGA in facilitating early oral cancer detection.