Secondary use of electronic health records for building cohort studies through top-down information extraction.
Journal:
Journal of biomedical informatics
Published Date:
Nov 21, 2014
Abstract
Controlled clinical trials are usually supported with an in-front data aggregation system, which supports the storage of relevant information according to the trial context within a highly structured environment. In contrast to the documentation of clinical trials, daily routine documentation has many characteristics that influence data quality. One such characteristic is the use of non-standardized text, which is an indispensable part of information representation in clinical information systems. Based on a cohort study we highlight challenges for mining electronic health records targeting free text entry fields within semi-structured data sources. Our prototypical information extraction system achieved an F-measure of 0.91 (precision=0.90, recall=0.93) for the training set and an F-measure of 0.90 (precision=0.89, recall=0.92) for the test set. We analyze the obtained results in detail and highlight challenges and future directions for the secondary use of routine data in general.