3000PA-Towards a National Reference Corpus of German Clinical Language.

Journal: Studies in health technology and informatics
Published Date:

Abstract

We introduce 3000PA, a clinical document corpus composed of 3,000 EPRs from three different clinical sites, which will serve as the backbone of a national reference language resource for German clinical NLP. We outline its design principles, results from a medication annotation campaign and the evaluation of a first medication information extraction prototype using a subset of 3000PA.

Authors

  • Udo Hahn
    Jena University Language & Information Engineering (JULIE) Lab Friedrich-Schiller-Universität Jena, Jena, Germany.
  • Franz Matthies
    Jena University Language & Information Engineering (JULIE) Lab Friedrich-Schiller-Universität Jena, Jena, Germany.
  • Christina Lohr
    Jena University Language & Information Engineering (JULIE) Lab Friedrich-Schiller-Universität Jena, Germany, http://www.julielab.de, Udo.Hahn@uni-jena.de, Franz.Matthies@uni-jena.de, Christina.Lohr@uni-jena.de.
  • Markus Löffler
    Institute for Medical Informatics, Statistics and Epidemiology (IMISE) Universität Leipzig, Germany Markus.Loffler@imise.uni-leipzig.de.