Deep Longitudinal Clusters of Type 2 Diabetes Pathophysiology and their Risk of Cardiovascular Disease Events and All-Cause Mortality
Journal:
medRxiv
Published Date:
Jun 3, 2026
Abstract
Objective: Despite the complex and non-linear progression of diabetes, its shared pathways with atherosclerotic cardiovascular disease (ASCVD) are conventionally described using models based on single time points. We identified longitudinal diabetes clusters before diagnosis using deep learning and studied their association with ASCVD events and mortality. Methods: We analyzed 157,670 visits from 15,871 adults (25-65 years) without diabetes from four pooled U.S. cohorts (median follow-up: 22 years [IQR: 9-30]). A gated recurrent unit model with decay (GRU-D) was used to predict 1-year risk of diabetes or censoring within 10 years, by learning longitudinal embeddings across 25 clinical characteristics and biomarkers. Parallel Factor Analysis-2 (PARAFAC-2) and Gaussian mixture models (GMM) were used to group longitudinal participant representations as clusters. Landmark time Cox proportional hazards regressions, relative to last observation in the training window, were used to study covariate-adjusted associations of clusters with ASCVD and mortality. Prognostic utility of clusters beyond the PREVENT risk score was assessed using Harrell's C-index. Findings were replicated in a fifth cohort. Results: The analytic sample was aged 49 years [SD: 11], 58% female, and 68% white; 1,202 (8%) developed diabetes within the first 10 years. We identified five clusters (Cluster A to E) that differed in their clinical characteristics over time. Cluster E (46%) had the highest cumulative incidence of diabetes in the study period, followed by Cluster C (40%) and Cluster A (38%). Cluster C, which was defined by older age, high blood pressure, and suboptimal renal function at the first visit, had higher rates of ASCVD (HR: 1.09, 95%CI: 0.98-1.21) and mortality (HR: 1.08, 95%CI: 1.00-1.16), relative to Cluster A despite being similar in age and BMI at the first visit. Relative to Cluster A, all other clusters had similar or lower rates of ASCVD and mortality. We observed substantial cluster effects for three clusters (Clusters C to E), which were based on only two cohorts. The two clusters (Clusters A and B) that included participants from all four cohorts were reproduced in the fifth cohort and showed similar rates of outcomes. Clusters did not improve ASCVD prognosis, relative to a model that included only the PREVENT risk score. Conclusions: Longitudinal clusters reveal substantial heterogeneity in the period before diabetes diagnosis, and their risk for ASCVD and mortality. However, clusters discovered may, in part, be explained by cohort effects from variations in recruitment and visit patterns after recruitment.