Natural Language Processing to Improve Prediction of Incident Atrial Fibrillation Using Electronic Health Records.

Journal: Journal of the American Heart Association

Published Date: Jul 29, 2022

Abstract

Background Models predicting atrial fibrillation (AF) risk, such as Cohorts for Heart and Aging Research in Genomic Epidemiology AF (CHARGE-AF), have not performed as well in electronic health records. Natural language processing (NLP) may improve models by using narrative electronic health record text. Methods and Results From a primary care network, we included patients aged ≥65 years with visits between 2003 and 2013 in development (n=32 960) and internal validation cohorts (n=13 992). An external validation cohort from a separate network from 2015 to 2020 included 39 051 patients. Model features were defined using electronic health record codified data and narrative data with NLP. We developed 2 models to predict 5-year AF incidence using (1) codified+NLP data and (2) codified data only and evaluated model performance. The analysis included 2839 incident AF cases in the development cohort and 1057 and 2226 cases in internal and external validation cohorts, respectively. The C-statistic was greater (<0.001) in codified+NLP model (0.744 [95% CI, 0.735-0.753]) compared with codified-only (0.730 [95% CI, 0.720-0.739]) in the development cohort. In internal validation, the C-statistic of codified+NLP was modestly higher (0.735 [95% CI, 0.720-0.749]) compared with codified-only (0.729 [95% CI, 0.715-0.744]; =0.06) and CHARGE-AF (0.717 [95% CI, 0.703-0.731]; =0.002). Codified+NLP and codified-only were well calibrated, whereas CHARGE-AF underestimated AF risk. In external validation, the C-statistic of codified+NLP (0.750 [95% CI, 0.740-0.760]) remained higher (<0.001) than codified-only (0.738 [95% CI, 0.727-0.748]) and CHARGE-AF (0.735 [95% CI, 0.725-0.746]). Conclusions Estimation of 5-year risk of AF can be modestly improved using NLP to incorporate narrative electronic health record data.

Authors

Jeffrey M Ashburner

Division of General Internal Medicine Massachusetts General Hospital Boston MA.
Yuchiao Chang

Division of General Internal Medicine Massachusetts General Hospital Boston MA.
Xin Wang

Key Laboratory of Bio-based Material Science & Technology (Northeast Forestry University), Ministry of Education, Harbin 150040, China.
Shaan Khurshid

Division of Cardiology, Massachusetts General Hospital, Boston, Massachusetts.
Christopher D Anderson

Cardiovascular Disease Initiative, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, Massachusetts.
Kumar Dahal

Division of Rheumatology, Department of Medicine, Brigham and Women's Hospital, Boston, MA, USA.
Dana Weisenfeld

Department of Rheumatology, Inflammation, and Immunity Brigham and Women's Hospital Boston MA.
Tianrun Cai

Harvard-MIT Center for Regulatory Science, Harvard Medical School, Boston, MA, United States.
Katherine P Liao

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.
Kavishwar B Wagholikar

Laboratory of Computer Science, Massachusetts General Hospital, 50 Staniford Street, Suite 750, Boston, MA, 02114, USA.
Shawn N Murphy
Steven J Atlas

Department of General Internal Medicine, Massachusetts General Hospital, Boston, MA, United States.
Steven A Lubitz

Cardiovascular Disease Initiative, Broad Institute of Harvard and the Massachusetts Institute of Technology, Cambridge, Massachusetts.
Daniel E Singer

Division of General Internal Medicine Massachusetts General Hospital Boston MA.

Keywords

Atrial Fibrillation Cohort Studies Electronic Health Records Humans Incidence Natural Language Processing Risk Assessment

External Resources

View on PubMed Access via DOI PubMed (35904194)

Natural Language Processing to Improve Prediction of Incident Atrial Fibrillation Using Electronic Health Records.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals