Facilitating clinical research through automation: Combining optical character recognition with natural language processing.

Journal: Clinical trials (London, England)
Published Date:

Abstract

BACKGROUND/AIMS: Performance status is crucial for most clinical research, as an eligibility criterion, a comorbidity covariate, or a trial endpoint. Yet information on performance status often is embedded as free text within a patient's electronic medical record, rather than coded directly, thereby making this concept extremely difficult to extract for research. Furthermore, performance status information frequently resides in outside reports, which are scanned into the electronic medical record along with thousands of clinic notes. The image format of scanned documents also is a major obstacle to the search and retrieval of information, as natural language processing cannot be applied to unstructured text within an image. We, therefore, utilized optical character recognition software to convert images to a searchable format, allowing the application of natural language processing to identify pertinent performance status data elements within scanned electronic medical records.

Authors

  • Julie Hom
    Department of Diabetes & Cancer Discovery Science, City of Hope, Duarte, CA, USA.
  • Janet Nikowitz
    Department of Diabetes & Cancer Discovery Science, City of Hope, Duarte, CA, USA.
  • Rebecca Ottesen
    Department of Diabetes & Cancer Discovery Science, City of Hope, Duarte, CA, USA.
  • Joyce C Niland
    Department of Diabetes & Cancer Discovery Science, City of Hope, Duarte, CA, USA.