Machine learning-based prediction of celiac antibody seropositivity by biochemical test parameters.

Journal: Scientific reports
Published Date:

Abstract

The diagnostic delay in celiac disease (CD) is currently a burden for individual and society. Biochemical tests may be used in risk-identification of CD to reduce the diagnostic delay, and we aimed to explore prediction models for CD antibody seropositivity. We developed two prediction models in a cohort study using data from primary care in greater Copenhagen (2006-2015). All patients with CD antibody tests were included. Two candidate sets of predictors were considered: (1) all blood tests measured, (2) tests deemed clinically relevant pre-study or previously studied. Both models assessed test results 5 years before CD-testing. We developed and evaluated prediction models in 10-fold cross-validation framework for each set of predictors. Four machine learning methods were combined in stacked models using SuperLearner. 54,877 patients were included, 672 CD antibody seropositive. Cross-validated estimated area under the curves were 0.68 and 0.63. Distributions of predicted probabilities overlapped substantially between patients with CD antibody seropositivity and seronegativity. Food allergen antibody and IgA were the most important predictors. Biochemical tests had low predictive power but provided methodological insights for future models. These may improve by combining biochemical tests with other clinical information but should preferably aim to stay clinically implementable.

Authors

  • Signe Ulfbeck Schovsbo
    Center for Clinical Research and Prevention, Copenhagen University Hospital - Bispebjerg and Frederiksberg, Copenhagen, Denmark. signe.ulfbeck.schovsbo@regionh.dk.
  • Michael Charles Sachs
    Section of Biostatistics, Department of Public Health, University of Copenhagen, Copenhagen, Denmark.
  • Margit Kriegbaum
    Copenhagen Primary Care Laboratory (CopLab) Database, Research Unit for General Practice and Section of General Practice, Department of Public Health, University of Copenhagen, Copenhagen, Denmark.
  • Anne Ahrendt Bjerregaard
    Center for Clinical Research and Prevention, Copenhagen University Hospital - Bispebjerg and Frederiksberg, Copenhagen, Denmark.
  • Line Tang Møllehave
    Center for Clinical Research and Prevention, Copenhagen University Hospital - Bispebjerg and Frederiksberg, Copenhagen, Denmark.
  • Susanne Hansen
    Center for Clinical Research and Prevention, Copenhagen University Hospital - Bispebjerg and Frederiksberg, Copenhagen, Denmark.
  • Bent Struer Lind
    Department of Clinical Biochemistry, Copenhagen University Hospital Hvidovre, Hvidovre, Denmark.
  • Tora Grauers Willadsen
    Copenhagen Primary Care Laboratory (CopLab) Database, Research Unit for General Practice and Section of General Practice, Department of Public Health, University of Copenhagen, Copenhagen, Denmark.
  • Allan Linneberg
    Center for Clinical Research and Prevention, Copenhagen University Hospital, Bispebjerg and Frederiksberg, Copenhagen, Denmark.
  • Christen Lykkegaard Andersen
    Copenhagen Primary Care Laboratory (CopLab) Database, Research Unit for General Practice and Section of General Practice, Department of Public Health, University of Copenhagen, Copenhagen, Denmark.
  • Line Lund Kårhus
    Center for Clinical Research and Prevention, Copenhagen University Hospital - Bispebjerg and Frederiksberg, Copenhagen, Denmark.