Using gut microbiome metagenomic hypervariable features for diabetes screening and typing through supervised machine learning.
Journal:
Microbial genomics
PMID:
40063675
Abstract
Diabetes mellitus is a complex metabolic disorder and one of the fastest-growing global public health concerns. The gut microbiota is implicated in the pathophysiology of various diseases, including diabetes. This study utilized 16S rRNA metagenomic data from a volunteer citizen science initiative to investigate microbial markers associated with diabetes status (positive or negative) and type (type 1 or type 2 diabetes mellitus) using supervised machine learning (ML) models. The diversity of the microbiome varied according to diabetes status and type. Differential microbial signatures between diabetes types and negative group revealed an increased presence of , , , , and in subjects with diabetes type 1, and , and the order in subjects with diabetes type 2. The decision tree, elastic net, random forest (RF) and support vector machine with radial kernel ML algorithms were trained to screen and type diabetes based on microbial profiles of 76 subjects with type 1 diabetes, 366 subjects with type 2 diabetes and 250 subjects without diabetes. Using the 1000 most variable features, tree-based models were the highest-performing algorithms. The RF screening models achieved the best performance, with an average area under the receiver operating characteristic curve (AUC) of 0.76, although all models lacked sensitivity. Reducing the dataset to 500 features produced an AUC of 0.77 with sensitivity increasing by 74% from 0.46 to 0.80. Model performance improved for the classification of negative-status and type 2 diabetes. Diabetes type models performed best with 500 features, but the metric performed poorly across all model iterations. ML has the potential to facilitate early diagnosis of diabetes based on microbial profiles of the gut microbiome.