Best Practices on Big Data Analytics to Address Sex-Specific Biases in Our Understanding of the Etiology, Diagnosis, and Prognosis of Diseases.

Journal: Annual review of biomedical data science
PMID:

Abstract

A bias in health research to favor understanding diseases as they present in men can have a grave impact on the health of women. This paper reports on a conceptual review of the literature on machine learning or natural language processing (NLP) techniques to interrogate big data for identifying sex-specific health disparities. We searched Ovid MEDLINE, Embase, and PsycINFO in October 2021 using synonyms and indexing terms for () "women," "men," or "sex"; () "big data," "artificial intelligence," or "NLP"; and () "disparities" or "differences." From 902 records, 22 studies met the inclusion criteria and were analyzed. Results demonstrate that the inclusion by sex is inconsistent and often unreported, although the inclusion of men in these studies is disproportionately less than women. Even though artificial intelligence and NLP techniques are widely applied in healthresearch, few studies use them to take advantage of unstructured text to investigate sex-related differences or disparities. Researchers are increasingly aware of sex-based data bias, but the process toward correction is slow. We reflect on best practices on using big data analytics to address sex-specific biases in understanding the etiology, diagnosis, and prognosis of diseases.

Authors

  • Su Golder
    Department of Health Sciences, University of York, York, United Kingdom.
  • Karen O'Connor
    Department of Biomedical Informatics, Arizona State University, Scottsdale, AZ, USA.
  • Yunwen Wang
    Annenberg School for Communication and Journalism, University of Southern California, Los Angeles, California, USA.
  • Robin Stevens
    Annenberg School for Communication and Journalism, University of Southern California, Los Angeles, California, USA.
  • Graciela Gonzalez-Hernandez
    Health Language Processing Center, Institute for Biomedical Informatics at the Perelman School of Medicine, University of Pennsylvania, Philadelphia, PA 19104, USA.