The Social Construction of Categorical Data: Mixed Methods Approach to Assessing Data Features in Publicly Available Datasets.

Journal: JMIR medical informatics

PMID: 39874567

Abstract

BACKGROUND: In data-sparse areas such as health care, computer scientists aim to leverage as much available information as possible to increase the accuracy of their machine learning models' outputs. As a standard, categorical data, such as patients' gender, socioeconomic status, or skin color, are used to train models in fusion with other data types, such as medical images and text-based medical information. However, the effects of including categorical data features for model training in such data-scarce areas are underexamined, particularly regarding models intended to serve individuals equitably in a diverse population.

Authors

Theresa Willem

Institute of History and Ethics in Medicine, Department of Preclinical Medicine, TUM School of Medicine and Health, Technical University of Munich, Ismaninger Straße 22, 81675, Munich, Germany. theresa.willem@tum.de.
Alessandro Wollek

Munich Institute of Biomedical Engineering, Technical University of Munich, Garching near Munich, Germany.
Theodor Cheslerean-Boghiu

Munich Institute of Biomedical Engineering, School of Computation, Information, and Technology, Technical University of Munich, Munich, Germany.
Martha Kenney

Women & Gender Studies, San Francisco State University, San Francisco, CA, United States.
Alena Buyx

Institute for History and Ethics of Medicine, Technical University of Munich School of Medicine, Technical University of Munich, Munich, Germany.

Keywords

Brazil Datasets as Topic Humans Machine Learning

External Resources

View on PubMed Access via DOI PubMed (39874567)

The Social Construction of Categorical Data: Mixed Methods Approach to Assessing Data Features in Publicly Available Datasets.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals