Out-of-distribution reject option method for dataset shift problem in early disease onset prediction.

Journal: Scientific reports

Published Date: Jun 2, 2025

Abstract

Machine learning is increasingly used to predict lifestyle-related disease onset using health and medical data. However, its predictive accuracy for use is often hindered by dataset shift, which refers to discrepancies in data distribution between the training and testing datasets. This issue leads to the misclassification of out-of-distribution (OOD) data. To diminish dataset shift in real-world settings, this paper proposes the out-of-distribution reject option for prediction (ODROP). This method integrates an OOD detection model to preclude OOD data from the prediction phase. We used two real-world health checkup datasets (Hirosaki and Wakayama) with dataset shift, across three disease onset prediction tasks: diabetes, dyslipidemia, and hypertension. Both components of ODROP method-the OOD detection model and the prediction model-were trained on the Hirosaki dataset. We assessed the effectiveness of ODROP on the Wakayama dataset using AUROC-rejection rate curve plot. In the five OOD detection approaches (the variational autoencoder, neural network ensemble std, neural network ensemble epistemic, neural network energy, and neural network Gaussian mixture based energy measurement), the variational autoencoder method demonstrated notably higher stability and a greater improvement in AUROC. For example, in the Wakayama dataset, the AUROC for diabetes onset increased from 0.80 without ODROP to 0.90 at a 31.1% rejection rate, and for dyslipidemia, it improved from 0.70 without ODROP to 0.76 at a 34% rejection rate. In addition, we were able to categorize dataset shifts into two types using SHAP clustering-those that considerably affect predictions and those that do not. We expect that this classification will help standardize measuring instruments. This study is the first to apply OOD detection to actual health and medical data, demonstrating its potential to substantially improve the accuracy and reliability of disease prediction models amidst dataset shift.

Authors

Taisei Tosaki

Graduate School of Medicine, Kyoto University, Kyoto, Japan.
Eiichiro Uchino

Department of Medical Intelligent Systems, Graduate School of Medicine, Kyoto University, Kyoto, Japan; Department of Nephrology, Graduate School of Medicine, Kyoto University, Kyoto, Japan.
Ryosuke Kojima

Department of Biomedical Data Intelligence, Kyoto University Graduate School of Medicine, Sakyo-ku, Kyoto, Kyoto, Japan.
Yohei Mineharu

Department of Artificial Intelligence in Healthcare and Medicine, Kyoto University Graduate School of Medicine, Kyoto, Japan.
Yuji Okamoto

Graduate School of Medicine, Kyoto University, Shogoin-Kawaharacho, Sakyo-ku, Kyoto 606-8507, Japan.
Mikio Arita

Graduate School of Health and Nursing Science, Wakayama Medical University, Wakayama, Japan.
Nobuyuki Miyai

Graduate School of Health and Nursing Science, Wakayama Medical University, Wakayama, Japan.
Yoshinori Tamada

Department of Medical Intelligent Systems, Graduate School of Medicine, Kyoto University, Kyoto, Japan.
Tatsuya Mikami

Department of Gastroenterology and Hematology, Hirosaki University Graduate School of Medicine, 5 Zaifu-cho, Hirosaki, 036-8562, Japan.
Koichi Murashita

Center of Innovation Research Initiatives Organization, Hirosaki University, Hirosaki, Japan.
Shigeyuki Nakaji

Department of Social Health, Hirosaki University Graduate School of Medicine, Hirosaki, Japan.
Yasushi Okuno

Graduate School of Medicine, Kyoto University, Shogoin-kawaharacho, city/>Sakyo-ku Kyoto, 606-8507, Japan.

Keywords

Datasets as Topic Diabetes Mellitus Dyslipidemias Female Humans Hypertension Machine Learning Male Neural Networks, Computer

External Resources

View on PubMed Access via DOI PubMed (40456759)

Out-of-distribution reject option method for dataset shift problem in early disease onset prediction.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals