Agile text mining for the 2014 i2b2/UTHealth Cardiac risk factors challenge.
Journal:
Journal of biomedical informatics
Published Date:
Dec 1, 2015
Abstract
This paper describes the use of an agile text mining platform (Linguamatics' Interactive Information Extraction Platform, I2E) to extract document-level cardiac risk factors in patient records as defined in the i2b2/UTHealth 2014 challenge. The approach uses a data-driven rule-based methodology with the addition of a simple supervised classifier. We demonstrate that agile text mining allows for rapid optimization of extraction strategies, while post-processing can leverage annotation guidelines, corpus statistics and logic inferred from the gold standard data. We also show how data imbalance in a training set affects performance. Evaluation of this approach on the test data gave an F-Score of 91.7%, one percent behind the top performing system.
Authors
Keywords
Aged
Cardiovascular Diseases
Cohort Studies
Comorbidity
Computer Security
Confidentiality
Data Mining
Diabetes Complications
Electronic Health Records
Female
Humans
Incidence
Longitudinal Studies
Male
Middle Aged
Narration
Natural Language Processing
Pattern Recognition, Automated
Risk Assessment
United Kingdom
Vocabulary, Controlled