An intentional approach to managing bias in general purpose embedding models.

Journal: The Lancet. Digital health
Published Date:

Abstract

Advances in machine learning for health care have brought concerns about bias from the research community; specifically, the introduction, perpetuation, or exacerbation of care disparities. Reinforcing these concerns is the finding that medical images often reveal signals about sensitive attributes in ways that are hard to pinpoint by both algorithms and people. This finding raises a question about how to best design general purpose pretrained embeddings (GPPEs, defined as embeddings meant to support a broad array of use cases) for building downstream models that are free from particular types of bias. The downstream model should be carefully evaluated for bias, and audited and improved as appropriate. However, in our view, well intentioned attempts to prevent the upstream components-GPPEs-from learning sensitive attributes can have unintended consequences on the downstream models. Despite producing a veneer of technical neutrality, the resultant end-to-end system might still be biased or poorly performing. We present reasons, by building on previously published data, to support the reasoning that GPPEs should ideally contain as much information as the original data contain, and highlight the perils of trying to remove sensitive attributes from a GPPE. We also emphasise that downstream prediction models trained for specific tasks and settings, whether developed using GPPEs or not, should be carefully designed and evaluated to avoid bias that makes models vulnerable to issues such as distributional shift. These evaluations should be done by a diverse team, including social scientists, on a diverse cohort representing the full breadth of the patient population for which the final model is intended.

Authors

  • Wei-Hung Weng
    Department of Biomedical Informatics, Harvard Medical School, 10 Shattuck Street, 4th Floor, Boston, MA, 02115, USA. ckbjimmy@mit.edu.
  • Andrew Sellergen
    Google, Mountain View, CA, USA.
  • Atilla P Kiraly
    Google AI, Mountain View, CA, USA.
  • Alexander D'Amour
    Google, Mountain View, CA, USA.
  • Jungyeon Park
    Google, Mountain View, CA, USA.
  • Rory Pilgrim
    From Google Health, 1600 Amphitheatre Pkwy, Mountain View, CA 94043 (S.K., J.Y., S.J., R.P., Z.N., C.C., N.B., S.M.M., T.H., A.P.K., G.S.C., L.P., K.C., P.H.C.C., Y.L., K.E., D.T., S.S., S.P.); Advanced Clinical, Deerfield, Ill (C.L.); Apollo Radiology International, Hyderabad, India (S.R.K.); TB Department, Center of Infectious Disease Research in Zambia, Lusaka, Zambia (M.M.); Sibanye Stillwater, Weltevreden Park, Roodepoort, South Africa (J.M.); and Clickmedix, Gaithersburg, Md (T.S.).
  • Stephen Pfohl
  • Charles Lau
    From Google Health, Google, 3400 Hillview Ave, Palo Alto, CA 94304 (A.B.S., C.C., Z.N., Y. Liu, K.E., D.T., N.B., S.S.); Google Research, Cambridge, Mass (Y. Li, A.M., A.S., J.H., D.K.); Google via Advanced Clinical, Deerfield, Ill (C.L.); Apollo Radiology International, Hyderabad, India (S.R.K.); and Northwestern Medicine, Chicago, Ill (M.E., F.G.V., D.M.).
  • Vivek Natarajan
    Google, Mountain View, CA, USA.
  • Shekoofeh Azizi
  • Alan Karthikesalingam
    Department of Outcomes Research, St George's Vascular Institute, London, SW17 0QT, United Kingdom.
  • Heather Cole-Lewis
    ICF International, Rockville, MD, United States.
  • Yossi Matias
    Google Research, Google LLC, 1600 Amphitheatre Parkway, Mountain View, CA, USA.
  • Greg S Corrado
    Google Health, Palo Alto, CA USA.
  • Dale R Webster
    Google Inc, Mountain View, California.
  • Shravya Shetty
    Google AI, Mountain View, CA, USA.
  • Shruthi Prabhakara
    From Google Health, 1600 Amphitheatre Pkwy, Mountain View, CA 94043 (S.K., J.Y., S.J., R.P., Z.N., C.C., N.B., S.M.M., T.H., A.P.K., G.S.C., L.P., K.C., P.H.C.C., Y.L., K.E., D.T., S.S., S.P.); Advanced Clinical, Deerfield, Ill (C.L.); Apollo Radiology International, Hyderabad, India (S.R.K.); TB Department, Center of Infectious Disease Research in Zambia, Lusaka, Zambia (M.M.); Sibanye Stillwater, Weltevreden Park, Roodepoort, South Africa (J.M.); and Clickmedix, Gaithersburg, Md (T.S.).
  • Krish Eswaran
    From Google Health, Google, 3400 Hillview Ave, Palo Alto, CA 94304 (A.B.S., C.C., Z.N., Y. Liu, K.E., D.T., N.B., S.S.); Google Research, Cambridge, Mass (Y. Li, A.M., A.S., J.H., D.K.); Google via Advanced Clinical, Deerfield, Ill (C.L.); Apollo Radiology International, Hyderabad, India (S.R.K.); and Northwestern Medicine, Chicago, Ill (M.E., F.G.V., D.M.).
  • Leo A G Celi
    Institute for Medical Engineering and Science, Massachusetts Institute of Technology, Cambridge, MA, USA; Department of Medicine, Beth Israel Deaconess Medical Center, Boston, MA, USA.
  • Yun Liu
    Google Health, Palo Alto, CA USA.