Facilitating clinical research through automation: Combining optical character recognition with natural language processing.
Journal:
Clinical trials (London, England)
Published Date:
May 24, 2022
Abstract
BACKGROUND/AIMS: Performance status is crucial for most clinical research, as an eligibility criterion, a comorbidity covariate, or a trial endpoint. Yet information on performance status often is embedded as free text within a patient's electronic medical record, rather than coded directly, thereby making this concept extremely difficult to extract for research. Furthermore, performance status information frequently resides in outside reports, which are scanned into the electronic medical record along with thousands of clinic notes. The image format of scanned documents also is a major obstacle to the search and retrieval of information, as natural language processing cannot be applied to unstructured text within an image. We, therefore, utilized optical character recognition software to convert images to a searchable format, allowing the application of natural language processing to identify pertinent performance status data elements within scanned electronic medical records.