The Social Construction of Categorical Data: Mixed Methods Approach to Assessing Data Features in Publicly Available Datasets.
Journal:
JMIR medical informatics
PMID:
39874567
Abstract
BACKGROUND: In data-sparse areas such as health care, computer scientists aim to leverage as much available information as possible to increase the accuracy of their machine learning models' outputs. As a standard, categorical data, such as patients' gender, socioeconomic status, or skin color, are used to train models in fusion with other data types, such as medical images and text-based medical information. However, the effects of including categorical data features for model training in such data-scarce areas are underexamined, particularly regarding models intended to serve individuals equitably in a diverse population.