[Scale-invariant feature-enhanced deep learning framework for oral mucosal lesion segmentation].
Journal:
Zhonghua kou qiang yi xue za zhi = Zhonghua kouqiang yixue zazhi = Chinese journal of stomatology
PMID:
40015705
Abstract
To develop PixelSIFT-UNet, a novel semantic segmentation model that integrates deep learning with scale-invariant feature transform (SIFT) algorithm to improve the segmentation accuracy of oral mucosal lesions. This investigation utilized 838 standard clinical white light images of oral mucosal diseases acquired from January 2020 to December 2022 at the Stomatology Hospital Zhejiang University School of Medicine. Randomization was achieved through Python's random.seed function implementation. The random sample function was subsequently applied for sampling distribution. The dataset was stratified into three subsets with a 6∶2∶2 ratio: training (=506), validation (=166), and testing (=166). Lesion boundaries were annotated using Labelme software, and a PixelSIFT-UNet-based deep learning model was developed with VGG-16 and ResNet-50 backbone networks. Model parameters were optimized using the validation set, and performance metrics [including Dice coefficient, mean intersection over union (mIoU), mean pixel accuracy (mPA), and Precision] were assessed on the test set. The model's performance was benchmarked against conventional semantic segmentation frameworks (U-Net and PSPNet). The developed PixelSIFT-UNet model could achieve precise segmentation of three common oral mucosal lesions: oral lichen planus, oral leukoplakia, and oral submucous fibrosis. Utilizing VGG-16 as the backbone network, the model achieved Dice coefficient, mIoU, mPA, and Precision values of 0.642, 0.699, 0.836, and 0.792, respectively. Implementation with ResNet-50 backbone network yielded metrics of 0.668, 0.733, 0.872 and 0.817, demonstrating significant improvements across all performance indicators compared to conventional U-Net model (relevant metrics: 0.662, 0.717, 0.861 and 0.809) and PSPNet model (relevant metrics: 0.671, 0.721, 0.858 and 0.813). The proposed PixelSIFT-UNet architecture demonstrates superior performance in oral mucosal lesion segmentation tasks, surpassing conventional semantic segmentation models and providing robust quantitative improvements in segmentation accuracy.