Predicting youth diabetes risk using NHANES data and machine learning.

Journal: Scientific reports
Published Date:

Abstract

Prediabetes and diabetes mellitus (preDM/DM) have become alarmingly prevalent among youth in recent years. However, simple questionnaire-based screening tools to reliably assess diabetes risk are only available for adults, not youth. As a first step in developing such a tool, we used a large-scale dataset from the National Health and Nutritional Examination Survey (NHANES) to examine the performance of a published pediatric clinical screening guideline in identifying youth with preDM/DM based on American Diabetes Association diagnostic biomarkers. We assessed the agreement between the clinical guideline and biomarker criteria using established evaluation measures (sensitivity, specificity, positive/negative predictive value, F-measure for the positive/negative preDM/DM classes, and Kappa). We also compared the performance of the guideline to those of machine learning (ML) based preDM/DM classifiers derived from the NHANES dataset. Approximately 29% of the 2858 youth in our study population had preDM/DM based on biomarker criteria. The clinical guideline had a sensitivity of 43.1% and specificity of 67.6%, positive/negative predictive values of 35.2%/74.5%, positive/negative F-measures of 38.8%/70.9%, and Kappa of 0.1 (95%CI: 0.06-0.14). The performance of the guideline varied across demographic subgroups. Some ML-based classifiers performed comparably to or better than the screening guideline, especially in identifying preDM/DM youth (p = 5.23 × 10).We demonstrated that a recommended pediatric clinical screening guideline did not perform well in identifying preDM/DM status among youth. Additional work is needed to develop a simple yet accurate screener for youth diabetes risk, potentially by using advanced ML methods and a wider range of clinical and behavioral health data.

Authors

  • Nita Vangeepuram
    Division of General Pediatrics, Department of Pediatrics, Icahn School of Medicine At Mount Sinai, 1 Gustave L. Levy Place Box 1077, New York, NY, 10029, USA. nita.vangeepuram@mssm.edu.
  • Bian Liu
  • Po-Hsiang Chiu
    Department of Biomedical Informatics, Columbia University, 622 W. 168th Street, New York, NY, USA. Electronic address: pc2694@columbia.edu.
  • Linhua Wang
    Department of Genetics and Genomic Sciences and Icahn Institute for Genomics and Multiscale Biology, Icahn School of Medicine at Mount Sinai, New York, NY, 10029, USA.
  • Gaurav Pandey
    Icahn Institute for Genomics and Multiscale Biology and Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA; Graduate School of Biomedical Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA. Electronic address: gaurav.pandey@mssm.edu.