Extraction of UMLS® Concepts Using Apache cTAKES™ for German Language.

Journal: Studies in health technology and informatics

Published Date: Jan 1, 2016

Abstract

Automatic information extraction of medical concepts and classification with semantic standards from medical reports is useful for standardization and for clinical research. This paper presents an approach for an UMLS concept extraction with a customized natural language processing pipeline for German clinical notes using Apache cTAKES. The objectives are, to test the natural language processing tool for German language if it is suitable to identify UMLS concepts and map these with SNOMED-CT. The German UMLS database and German OpenNLP models extended the natural language processing pipeline, so the pipeline can normalize to domain ontologies such as SNOMED-CT using the German concepts. For testing, the ShARe/CLEF eHealth 2013 training dataset translated into German was used. The implemented algorithms are tested with a set of 199 German reports, obtaining a result of average 0.36 F1 measure without German stemming, pre- and post-processing of the reports.

Authors

Matthias Becker

Department of Medical Informatics, University of Applied Sciences and Arts, Dortmund, Germany.
Britta Böckmann

Department of Medical Informatics, University of Applied Sciences and Arts, Dortmund, Germany.

Keywords

Data Mining Electronic Health Records Germany Humans Information Storage and Retrieval Natural Language Processing Unified Medical Language System

External Resources

View on PubMed PubMed (27139387)

Extraction of UMLS® Concepts Using Apache cTAKES™ for German Language.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals