AI Bias and Confounding Risk in Health Feature Engineering for Machine Learning Classification Task.

Journal: Studies in health technology and informatics

Published Date: Aug 7, 2025

Abstract

Recent advancements in machine learning bring unique opportunities in health fields but also pose considerable challenges. Due to stringent ethical considerations and resource constraints, health data can vary in scope, population coverage, and collection granularity, prone to different AI bias and confounding risks in the performance of a classification task. This experimental study explored the impact on hidden confounding risk of model performance in a cardiovascular readmission prediction task using real-life health data from 'Data-derived Risk assessment using the Electronic medical record through Application of Machine Learning' (DREAM). Five commonly used machine learning models-k-nearest neighbors (KNN), random forest (RF), decision tree (DT), Catboost and Xgboost-were selected for this task. Model performance was assessed via the area under the receiver operating characteristics curve (AUC) and F1 score, both before and after propensity score adjustment. Based on density plot comparison of the adjustment, the difference mainly contributed from patients aged 20 and 40. High fluctuation on the model performance has been noted by including and excluding patients under this age group. After reasoning, high-risk pregnant females may serve as a confounding factor in the original model generation. The pregnancy rate in the non-readmitted group is significantly higher than that in the readmitted group (x2 = 10.2, p < 0.001). However, pregnant status required additional information query from a different hospital system. Without carefully consideration of confounding risks, traditional pipeline may generate a less robotic classifier in the clinical setting. Incorporating propensity score matching could be a solution to randomise invisible confounding factors between the classes.

Authors

Ruihua Guo

School of Computer Science, The University of Sydney, Sydney, Australia.
Angus Ritchie

Concord Clinical School, The University of Sydney, Sydney, NSW, Australia.
Ross Smith

School of Computer Science, The University of Sydney, NSW, Australia 2008.
Yang Lu

Spectral MD, Inc., 2515 McKinney Avenue, Suite 1000, Dallas, Texas 75201, United States.
Haeri Min

School of Computer Science, The University of Sydney, NSW, Australia 2008.
Simon K Poon

School of Computer Science, The University of Sydney, Australia; Western Sydney Local Health District, Australia. Electronic address: simon.poon@sydney.edu.au.

Keywords

Adult Bias Cardiovascular Diseases Confounding Factors, Epidemiologic Electronic Health Records Female Humans Machine Learning Patient Readmission Pregnancy Risk Assessment

External Resources

View on PubMed Access via DOI PubMed (40775971)

AI Bias and Confounding Risk in Health Feature Engineering for Machine Learning Classification Task.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals

AI Bias and Confounding Risk in Health Feature Engineering for Machine Learning Classification Task.

Abstract

Authors

Keywords

External Resources

Don't Miss the Future of Medicine

Popular Topics

Recent Journals