Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge.

Journal: Journal of biomedical informatics

Published Date: Dec 1, 2015

Abstract

This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system.

Authors

James Cormack

Linguamatics Ltd., 324 Cambridge Science Park, Milton Road, Cambridge CB4 0WG, UK. Electronic address: james.cormack@linguamatics.com.
Chinmoy Nath

Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, 750 N. Lake Shore Drive, 11th Floor, Chicago, IL 60611, USA.
David Milward

Linguamatics Ltd., 324 Cambridge Science Park, Milton Road, Cambridge CB4 0WG, UK.
Kalpana Raja

Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA.
Siddhartha R Jonnalagadda

Division of Health and Biomedical Informatics, Department of Preventive Medicine, Northwestern University Feinberg School of Medicine, 750 N. Lake Shore Drive, 11th Floor, Chicago, IL 60611, USA.

Keywords

Aged Cardiovascular Diseases Cohort Studies Comorbidity Computer Security Confidentiality Data Mining Diabetes Complications Electronic Health Records Female Humans Incidence Longitudinal Studies Male Middle Aged Narration Natural Language Processing Pattern Recognition, Automated Risk Assessment United Kingdom Vocabulary, Controlled

External Resources

View on PubMed Access via DOI PubMed (26209007)

Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals