Extraction of UMLS® Concepts Using Apache cTAKES™ for German Language.

Journal: Studies in health technology and informatics
Published Date:

Abstract

Automatic information extraction of medical concepts and classification with semantic standards from medical reports is useful for standardization and for clinical research. This paper presents an approach for an UMLS concept extraction with a customized natural language processing pipeline for German clinical notes using Apache cTAKES. The objectives are, to test the natural language processing tool for German language if it is suitable to identify UMLS concepts and map these with SNOMED-CT. The German UMLS database and German OpenNLP models extended the natural language processing pipeline, so the pipeline can normalize to domain ontologies such as SNOMED-CT using the German concepts. For testing, the ShARe/CLEF eHealth 2013 training dataset translated into German was used. The implemented algorithms are tested with a set of 199 German reports, obtaining a result of average 0.36 F1 measure without German stemming, pre- and post-processing of the reports.

Authors

  • Matthias Becker
    Department of Medical Informatics, University of Applied Sciences and Arts, Dortmund, Germany.
  • Britta Böckmann
    Department of Medical Informatics, University of Applied Sciences and Arts, Dortmund, Germany.