OralEpitheliumDB: A Dataset for Oral Epithelial Dysplasia Image Segmentation and Classification.

Journal: Journal of imaging informatics in medicine
PMID:

Abstract

Early diagnosis of potentially malignant disorders, such as oral epithelial dysplasia, is the most reliable way to prevent oral cancer. Computational algorithms have been used as an auxiliary tool to aid specialists in this process. Usually, experiments are performed on private data, making it difficult to reproduce the results. There are several public datasets of histological images, but studies focused on oral dysplasia images use inaccessible datasets. This prevents the improvement of algorithms aimed at this lesion. This study introduces an annotated public dataset of oral epithelial dysplasia tissue images. The dataset includes 456 images acquired from 30 mouse tongues. The images were categorized among the lesion grades, with nuclear structures manually marked by a trained specialist and validated by a pathologist. Also, experiments were carried out in order to illustrate the potential of the proposed dataset in classification and segmentation processes commonly explored in the literature. Convolutional neural network (CNN) models for semantic and instance segmentation were employed on the images, which were pre-processed with stain normalization methods. Then, the segmented and non-segmented images were classified with CNN architectures and machine learning algorithms. The data obtained through these processes is available in the dataset. The segmentation stage showed the F1-score value of 0.83, obtained with the U-Net model using the ResNet-50 as a backbone. At the classification stage, the most expressive result was achieved with the Random Forest method, with an accuracy value of 94.22%. The results show that the segmentation contributed to the classification results, but studies are needed for the improvement of these stages of automated diagnosis. The original, gold standard, normalized, and segmented images are publicly available and may be used for the improvement of clinical applications of CAD methods on oral epithelial dysplasia tissue images.

Authors

  • Adriano Barbosa Silva
    Faculty of Computer Science (FACOM) - Federal University of Uberlândia (UFU), Av. João Naves de Ávila 2121, BLB, 38400-902, Uberlândia, MG, Brazil. adrianobs@gmail.com.
  • Alessandro Santana Martins
    Federal Institute of Triângulo Mineiro, R. Belarmino Vilela Junqueira S/N, 38305-200 Ituiutaba, Minas Gerais, Brazil. Electronic address: alessandro@iftm.edu.br.
  • Thaína Aparecida Azevedo Tosta
    Science and Technology Institute, Federal University of São Paulo (UNIFESP), Av. Cesare Mansueto Giulio Lattes, 1201, 12247-014, São José dos Campos, SP, Brazil.
  • Adriano Mota Loyola
    School of Dentistry, Federal University of Uberlândia (UFU), Av. Pará - 1720, 38405-320, Uberlândia, MG, Brazil.
  • Sérgio Vitorino Cardoso
    School of Dentistry, Federal University of Uberlândia (UFU), Av. Pará - 1720, 38405-320, Uberlândia, MG, Brazil.
  • Leandro Alves Neves
    Department of Computer Science and Statistics, São Paulo State University, R. Cristóvão Colombo, 2265, 15054-000 São José do Rio Preto, São Paulo, Brazil. Electronic address: neves.leandro@gmail.com.
  • Paulo Rogério de Faria
    Department of Histology and Morphology, Institute of Biomedical Science, Federal University of Uberlândia, Av. Amazonas, S/N, 38405-320 Uberlândia, Minas Gerais, Brazil. Electronic address: paulo.faria@ufu.br.
  • Marcelo Zanchetta do Nascimento
    Center of Mathematics, Computing and Cognition, Federal University of ABC, Av. dos Estados, 5001, 09210-580 Santo André, São Paulo, Brazil; Faculty of Computer Science, Federal University of Uberlândia, Av. João Naves de Ávila, 2121, 38400-902 Uberlândia, Minas Gerais, Brazil. Electronic address: marcelo.zanchetta@gmail.com.