Generative models improve fairness of medical classifiers under distribution shifts.

Journal: Nature medicine
Published Date:

Abstract

Domain generalization is a ubiquitous challenge for machine learning in healthcare. Model performance in real-world conditions might be lower than expected because of discrepancies between the data encountered during deployment and development. Underrepresentation of some groups or conditions during model development is a common cause of this phenomenon. This challenge is often not readily addressed by targeted data acquisition and 'labeling' by expert clinicians, which can be prohibitively expensive or practically impossible because of the rarity of conditions or the available clinical expertise. We hypothesize that advances in generative artificial intelligence can help mitigate this unmet need in a steerable fashion, enriching our training dataset with synthetic examples that address shortfalls of underrepresented conditions or subgroups. We show that diffusion models can automatically learn realistic augmentations from data in a label-efficient manner. We demonstrate that learned augmentations make models more robust and statistically fair in-distribution and out of distribution. To evaluate the generality of our approach, we studied three distinct medical imaging contexts of varying difficulty: (1) histopathology, (2) chest X-ray and (3) dermatology images. Complementing real samples with synthetic ones improved the robustness of models in all three medical tasks and increased fairness by improving the accuracy of clinical diagnosis within underrepresented groups, especially out of distribution.

Authors

  • Ira Ktena
    Google DeepMind, London, UK. iraktena@google.com.
  • Olivia Wiles
    Google DeepMind, London, UK. oawiles@google.com.
  • Isabela Albuquerque
  • Sylvestre-Alvise Rebuffi
    Google DeepMind, London, UK.
  • Ryutaro Tanno
    Centre for Medical Image Computing and Department of Computer Science, UCL, Gower Street, London WC1E 6BT, UK; Healthcare Intelligence, Microsoft Research Cambridge, UK. Electronic address: r.tanno@cs.ucl.ac.uk.
  • Abhijit Guha Roy
    Department of Electrical Engineering, Indian Institute of Technology Kharagpur, West Bengal, India.
  • Shekoofeh Azizi
  • Danielle Belgrave
    Microsoft Research Cambridge, Cambridge, United Kingdom.
  • Pushmeet Kohli
    DeepMind, London, UK.
  • Taylan Cemgil
    Google DeepMind, London, UK.
  • Alan Karthikesalingam
    Department of Outcomes Research, St George's Vascular Institute, London, SW17 0QT, United Kingdom.
  • Sven Gowal
    Google DeepMind, London, UK.