Creation of a new longitudinal corpus of clinical narratives.

Journal: Journal of biomedical informatics

Published Date: Dec 1, 2015

Abstract

The 2014 i2b2/UTHealth Natural Language Processing (NLP) shared task featured a new longitudinal corpus of 1304 records representing 296 diabetic patients. The corpus contains three cohorts: patients who have a diagnosis of coronary artery disease (CAD) in their first record, and continue to have it in subsequent records; patients who do not have a diagnosis of CAD in the first record, but develop it by the last record; patients who do not have a diagnosis of CAD in any record. This paper details the process used to select records for this corpus and provides an overview of novel research uses for this corpus. This corpus is the only annotated corpus of longitudinal clinical narratives currently available for research to the general research community.

Authors

Vishesh Kumar

Dartmouth-Hitchcock Medical Center, Division of Cardiology, Lebanon, NH, USA.
Amber Stubbs

School of Library and Information Science, Simmons College, Boston, MA, USA. Electronic address: stubbs@simmons.edu.
Stanley Shaw

Harvard Medical School, Boston, MA 02115, USA; Center for Systems Biology, Massachusetts General Hospital, Boston, MA 02114, USA.
Ozlem Uzuner

Department of Information Studies, University at Albany, SUNY. Albany, NY.

Keywords

Aged Boston Cohort Studies Comorbidity Computer Security Confidentiality Coronary Artery Disease Data Mining Diabetes Complications Electronic Health Records Female Humans Incidence Male Middle Aged Narration Natural Language Processing Risk Assessment Vocabulary, Controlled

External Resources

View on PubMed Access via DOI PubMed (26433122)

Creation of a new longitudinal corpus of clinical narratives.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals