Machine learning-based prediction of celiac antibody seropositivity by biochemical test parameters.
Journal:
Scientific reports
Published Date:
Jul 3, 2025
Abstract
The diagnostic delay in celiac disease (CD) is currently a burden for individual and society. Biochemical tests may be used in risk-identification of CD to reduce the diagnostic delay, and we aimed to explore prediction models for CD antibody seropositivity. We developed two prediction models in a cohort study using data from primary care in greater Copenhagen (2006-2015). All patients with CD antibody tests were included. Two candidate sets of predictors were considered: (1) all blood tests measured, (2) tests deemed clinically relevant pre-study or previously studied. Both models assessed test results 5 years before CD-testing. We developed and evaluated prediction models in 10-fold cross-validation framework for each set of predictors. Four machine learning methods were combined in stacked models using SuperLearner. 54,877 patients were included, 672 CD antibody seropositive. Cross-validated estimated area under the curves were 0.68 and 0.63. Distributions of predicted probabilities overlapped substantially between patients with CD antibody seropositivity and seronegativity. Food allergen antibody and IgA were the most important predictors. Biochemical tests had low predictive power but provided methodological insights for future models. These may improve by combining biochemical tests with other clinical information but should preferably aim to stay clinically implementable.