Characteristics of publicly available skin cancer image datasets: a systematic review.

Journal: The Lancet. Digital health
Published Date:

Abstract

Publicly available skin image datasets are increasingly used to develop machine learning algorithms for skin cancer diagnosis. However, the total number of datasets and their respective content is currently unclear. This systematic review aimed to identify and evaluate all publicly available skin image datasets used for skin cancer diagnosis by exploring their characteristics, data access requirements, and associated image metadata. A combined MEDLINE, Google, and Google Dataset search identified 21 open access datasets containing 106 950 skin lesion images, 17 open access atlases, eight regulated access datasets, and three regulated access atlases. Images and accompanying data from open access datasets were evaluated by two independent reviewers. Among the 14 datasets that reported country of origin, most (11 [79%]) originated from Europe, North America, and Oceania exclusively. Most datasets (19 [91%]) contained dermoscopic images or macroscopic photographs only. Clinical information was available regarding age for 81 662 images (76·4%), sex for 82 848 (77·5%), and body site for 79 561 (74·4%). Subject ethnicity data were available for 1415 images (1·3%), and Fitzpatrick skin type data for 2236 (2·1%). There was limited and variable reporting of characteristics and metadata among datasets, with substantial under-representation of darker skin types. This is the first systematic review to characterise publicly available skin image datasets, highlighting limited applicability to real-life clinical settings and restricted population representation, precluding generalisability. Quality standards for characteristics and metadata reporting for skin image datasets are needed.

Authors

  • David Wen
    Oxford University Clinical Academic Graduate School, University of Oxford, Oxford, UK; Institute of Clinical Sciences, University of Birmingham, Birmingham, UK; Royal Berkshire Hospital, Royal Berkshire NHS Foundation Trust, Reading, UK.
  • Saad M Khan
    Academic Unit of Ophthalmology, Institute of Inflammation & Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK.
  • Antonio Ji Xu
    Department of Dermatology, Churchill Hospital, Oxford University Hospitals NHS Foundation Trust, Oxford, UK.
  • Hussein Ibrahim
    Academic Unit of Ophthalmology, Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, B15 2TT, UK.
  • Luke Smith
    Databiology, Oxford, UK.
  • Jose Caballero
  • Luis Zepeda
    Databiology, Oxford, UK.
  • Carlos de Blas Perez
    Databiology, Oxford, UK.
  • Alastair K Denniston
    Centre for Patient Reported Outcomes Research Institute of Applied Health Research University of Birmingham Birmingham Reino Unido Centre for Patient Reported Outcomes Research, Institute of Applied Health Research, University of Birmingham, Birmingham, Reino Unido.
  • Xiaoxuan Liu
    Birmingham Health Partners Centre for Regulatory Science and Innovation University of Birmingham Birmingham Reino Unido Birmingham Health Partners Centre for Regulatory Science and Innovation, University of Birmingham, Birmingham, Reino Unido.
  • Rubeta N Matin
    Oxford University Hospitals NHS Foundation Trust, Oxford, United Kingdom.