ATN Classification and Machine-Learned Plasma Biomarker Phenotypes Reveal Distinct Alzheimer's Pathology in a Population-Based Cohort

Journal: medRxiv
Published Date:

Abstract

BackgroundThe ATN (Amyloid/Tau/Neurodegeneration) framework provides a theory-driven approach to Alzheimers disease (AD) classification using binary biomarker cutoffs, while unsupervised machine learning offers data-driven phenotyping. The concordance between these approaches in population-representative samples remains incompletely characterized. ObjectiveTo compare plasma ATN classification with data-driven clustering methods and evaluate their associations with cognitive outcomes in a nationally representative cohort. MethodsWe analyzed plasma biomarkers (Abeta42/40 ratio, p-tau181, NfL, GFAP) from 4,465 participants aged [≥]51 years in the Health and Retirement Study 2016 Venous Blood Study. ATN profiles were classified using literature-based cutoffs. We applied k-means clustering, Gaussian mixture modeling, and variational autoencoder (VAE) dimensionality reduction to identify data-driven biomarker phenotypes. Agreement between ATN and clustering was quantified using adjusted Rand index (ARI) and normalized mutual information (NMI). Longitudinal analyses examined associations with cognitive decline over 4 years (2016-2020). ResultsThe analytic sample included 4,465 individuals (mean age 69.7{+/-}10.4 years; 58.7% female; 75.8% non-Hispanic White). ATN classification yielded 14 profiles, with A+/T-/N- (27.4%) and A-/T-/N-(22.6%) most prevalent. K-means clustering identified 4 optimal clusters with distinct biomarker signatures. Agreement between ATN and clusters was modest (ARI=0.119, NMI=0.113). Sensitivity analysis excluding GFAP from clustering reduced agreement substantially (ARI=0.03 vs 0.119 with GFAP, -74.5% decrease), demonstrating that GFAP accounts for most of the observed concordance between clustering and ATN classification, with only one-third arising from the shared three biomarkers.[Table S12] Additional sensitivity analyses confirmed that k=4 provides finer biomarker resolution than k=3 by retaining biomarker-extreme subgroups[Table S13], and that Cluster 4 represents a stable biological structure across distance metrics[Table S14] despite its small size. Cluster 1 (n=51, 1.2%) showed severe pathology; Cluster 3 (n=3,479, 78.6%) represented the largest and most heterogeneous group, encompassing the broad spectrum of minimal to moderate pathology across all ATN profiles; Cluster 4 (n=14, 0.3%) represented a small but stable non-AD biomarker-defined subgroup (Jaccard=0.779). The VAE revealed a localized nonlinear structure. Silhouette values in the latent space are not directly comparable to clustering silhouettes, but the VAE embedding showed clearer local separation, whereas PCA explained more variance (67.1%). Both ATN and clusters predicted 4-year cognitive decline (ATN R{superscript 2}=0.024, p<0.001; Clusters R{superscript 2}=0.019, p<0.001). ConclusionsTheory-driven ATN classification and data-driven biomarker phenotyping capture partially overlapping but largely distinct information. Modest concordance (ARI=0.119) reflects GFAPs contribution to shared structure, with most alignment arising from GFAP rather than from the three ATN biomarkers alone (ARI=0.03). The primary source of discordance remains the binary versus continuous representation of biomarker variation. Sensitivity analyses showed that k=4 provides finer biomarker resolution than k=3, and that Cluster 4 represents a small but reproducible biomarker-defined subgroup. Both approaches predicted cognitive decline with modest effect sizes (R{superscript 2}=1.9-2.4%), consistent with population-based studies. Integrating theory-driven and data-driven frameworks may support a more comprehensive characterization of AD-related pathology in population research.

Authors

  • Chea
  • E. F.

Categories