Unsupervised Latent Pattern Analysis for Estimating Type 2 Diabetes Risk in Undiagnosed Populations
Journal:
arXiv
Published Date:
May 27, 2025
Abstract
The global prevalence of diabetes, particularly type 2 diabetes mellitus
(T2DM), is rapidly increasing, posing significant health and economic
challenges. T2DM not only disrupts blood glucose regulation but also damages
vital organs such as the heart, kidneys, eyes, nerves, and blood vessels,
leading to substantial morbidity and mortality. In the US alone, the economic
burden of diagnosed diabetes exceeded \$400 billion in 2022. Early detection of
individuals at risk is critical to mitigating these impacts. While machine
learning approaches for T2DM prediction are increasingly adopted, many rely on
supervised learning, which is often limited by the lack of confirmed negative
cases. To address this limitation, we propose a novel unsupervised framework
that integrates Non-negative Matrix Factorization (NMF) with statistical
techniques to identify individuals at risk of developing T2DM. Our method
identifies latent patterns of multimorbidity and polypharmacy among diagnosed
T2DM patients and applies these patterns to estimate the T2DM risk in
undiagnosed individuals. By leveraging data-driven insights from comorbidity
and medication usage, our approach provides an interpretable and scalable
solution that can assist healthcare providers in implementing timely
interventions, ultimately improving patient outcomes and potentially reducing
the future health and economic burden of T2DM.