Automatic classification of scanned electronic health record documents.

Journal: International journal of medical informatics
Published Date:

Abstract

OBJECTIVES: Electronic Health Records (EHRs) contain scanned documents from a variety of sources such as identification cards, radiology reports, clinical correspondence, and many other document types. We describe the distribution of scanned documents at one health institution and describe the design and evaluation of a system to categorize documents into clinically relevant and non-clinically relevant categories as well as further sub-classifications. Our objective is to demonstrate that text classification systems can accurately classify scanned documents.

Authors

  • Heath Goodrum
    School of Biomedical Informatics, The University of Texas Health Science Center at Houston, TX, United States.
  • Kirk Roberts
    The University of Texas Health Science Center at Houston, USA.
  • Elmer V Bernstam
    Center for Computational Biomedicine, School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX 77030, USA, Department of Public Health Science, Medical University of South Carolina, 135 Cannon Street, Suite 303, Charleston, SC 29425, USA and Department of Investigational Cancer Therapeutics, Institute for Personalized Cancer Therapy, UT-MD Anderson Cancer Center, 1400 Holcombe Blvd., FC8.3044, Houston, TX 77030, USA.