Clinically reported covert cerebrovascular disease and risk of neurological disease: a whole-population cohort of 395,273 people using natural language processing

Journal: medRxiv
Published Date:

Abstract

Understanding the relevance of covert cerebrovascular disease (CCD) for later health will allow clinicians to more effectively monitor and target interventions. To examine the association between clinically reported CCD, measured using natural language processing (NLP), and subsequent disease risk. We conducted a retrospective e-cohort study using linked health record data. From all people with clinical brain imaging in Scotland from 2010 to 2018, we selected people with no prior hospitalisation for neurological disease. The data were analysed from March 2024 to June 2025. Four phenotypes were identified with NLP of imaging reports: white matter hypoattenuation or hyperintensities (WMH), lacunes, cortical infarcts and cerebral atrophy. Hazard ratios (aHR) for stroke, dementia, and Parkinson’s disease (conditions previously associated with CCD), epilepsy (a brain-based control condition) and colorectal cancer (a non-brain control condition), adjusted for age, sex, deprivation, region, scan modality, and pre-scan healthcare, were calculated for each phenotype. From 395,273 people with brain imaging and no history of neurological disease, 145,978 (37%) had ≥1 phenotype. For each phenotype, the aHR of any stroke was: WMH 1.4 (95%CI: 1.3–1.4), lacunes 1.6 (1.5–1.6), cortical infarct 1.7 (1.6–1.8), and cerebral atrophy 1.1 (1.0–1.1). The aHR of any dementia was: WMH, 1.3 (1.3–1.3), lacunes, 1.0 (0.9–1.0), cortical infarct 1.1 (1.0–1.1) and cerebral atrophy 1.7 (1.7–1.7). The aHR of Parkinson’s disease was, in people with a report of: WMH 1.1 (1.0–1.2), lacunes 1.1 (0.9–1.2), cortical infarct 0.7 (0.6–0.9) and cerebral atrophy 1.4 (1.3–1.5). The aHRs between CCD phenotypes and epilepsy and colorectal cancer overlapped the null. NLP identified CCD and atrophy phenotypes from routine clinical image reports, and these had important associations with future stroke, dementia and Parkinson’s disease. Prevention of neurological disease in people with CCD should be a priority for healthcare providers and policymakers. Are measures of Covert Cerebrovascular Disease (CCD) associated with the risk of subsequent disease (stroke, dementia, Parkinson’s disease, epilepsy, and colorectal cancer)? This study used a validated NLP algorithm to identify CCD (white matter hypoattenuation/hyperintensities, lacunes, cortical infarcts) and cerebral atrophy from both MRI and computed tomography (CT) imaging reports generated during routine healthcare in >395K people in Scotland. In adjusted models, we demonstrate higher risk of dementia (particularly Alzheimer’s disease) in people with atrophy, and higher risk of stroke in people with cortical infarcts. However, associations with an age-associated control outcome (colorectal cancer) were neutral, supporting a causal relationship. It also highlights differential associations between cerebral atrophy and dementia and cortical infarcts and stroke risk. CCD or atrophy on brain imaging reports in routine clinical practice is associated with a higher risk of stroke or dementia. Evidence is needed to support treatment strategies to reduce this risk. NLP can identify these important, otherwise uncoded, disease phenotypes, allowing research at scale into imaging-based biomarkers of dementia and stroke.

Authors

  • Matthew H Iveson; Mome Mukerjee; Emma M Davidson; Huayu Zhang; Laura Sherlock; Emily L Ball; Grant Mair; Alice Hosking; Heather Whalley; Michael T C Poon; Joanna M Wardlaw; David Kent; Richard Tobin; Claire Grover; Beatrice Alex; William N Whiteley

Categories