Revealing transparency gaps in publicly available COVID-19 datasets used for medical artificial intelligence development-a systematic review.

Journal: The Lancet. Digital health
PMID:

Abstract

During the COVID-19 pandemic, artificial intelligence (AI) models were created to address health-care resource constraints. Previous research shows that health-care datasets often have limitations, leading to biased AI technologies. This systematic review assessed datasets used for AI development during the pandemic, identifying several deficiencies. Datasets were identified by screening articles from MEDLINE and using Google Dataset Search. 192 datasets were analysed for metadata completeness, composition, data accessibility, and ethical considerations. Findings revealed substantial gaps: only 48% of datasets documented individuals' country of origin, 43% reported age, and under 25% included sex, gender, race, or ethnicity. Information on data labelling, ethical review, or consent was frequently missing. Many datasets reused data with inadequate traceability. Notably, historical paediatric chest x-rays appeared in some datasets without acknowledgment. These deficiencies highlight the need for better data quality and transparent documentation to lessen the risk that biased AI models are developed in future health emergencies.

Authors

  • Joseph E Alderman
    University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK.
  • Maria Charalambides
    College of Medical & Dental Sciences, University of Birmingham, Birmingham, UK.
  • Gagandeep Sachdeva
    The Royal Wolverhampton NHS Trust, UK.
  • Elinor Laws
    Institute of Inflammation and Ageing, College of Medical and Dental Sciences, University of Birmingham, Birmingham, UK.
  • Joanne Palmer
    University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK.
  • Elsa Lee
    Guy's, King's, & St Thomas' School of Medical Education, King's College London, London, UK.
  • Vaishnavi Menon
    Queen Elizabeth Hospital, University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK.
  • Qasim Malik
    AI Centre for Value Based Healthcare, King's College London, London, UK; Birmingham Women's and Children's NHS Foundation Trust, Birmingham, UK.
  • Sonam Vadera
    University Hospitals Birmingham NHS Foundation Trust, Birmingham, UK; University Hospitals of Leicester NHS Trust, Leicester, UK.
  • Melanie Calvert
    Birmingham Health Partners Centre for Regulatory Science and Innovation, University of Birmingham, Birmingham, UK.
  • Marzyeh Ghassemi
    Electrical Engineering and Computer Science, Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, United States.
  • Melissa D McCradden
    Division of Neurosurgery (McCradden, Baba, Saha, Boparai, Fadaiefard, Cusimano), St. Michael's Hospital, Unity Health Toronto; Dalla Lana School of Public Health (Cusimano), University of Toronto, Toronto, Ont. injuryprevention@smh.ca.
  • Johan Ordish
    Medicines and Healthcare Products Regulatory Agency, London, UK.
  • Bilal Mateen
    Wellcome Trust, London, UK.
  • Charlotte Summers
    Division of Anaesthesia, University of Cambridge, Addenbrooke's Hospital, Cambridge, UK.
  • Jacqui Gath
    Patient Partner, Birmingham, UK.
  • Rubeta N Matin
    Oxford University Hospitals NHS Foundation Trust, Oxford, United Kingdom.
  • Alastair K Denniston
    Centre for Patient Reported Outcomes Research Institute of Applied Health Research University of Birmingham Birmingham Reino Unido Centre for Patient Reported Outcomes Research, Institute of Applied Health Research, University of Birmingham, Birmingham, Reino Unido.
  • Xiaoxuan Liu
    Birmingham Health Partners Centre for Regulatory Science and Innovation University of Birmingham Birmingham Reino Unido Birmingham Health Partners Centre for Regulatory Science and Innovation, University of Birmingham, Birmingham, Reino Unido.