Announcement of the German Medical Text Corpus Project (GeMTeX).

Journal: Studies in health technology and informatics
Published Date:

Abstract

The largest publicly funded project to generate a German-language medical text corpus will start in mid-2023. GeMTeX comprises clinical texts from information systems of six university hospitals, which will be made accessible for NLP by annotation of entities and relations, which will be enhanced with additional meta-information. A strong governance provides a stable legal framework for the use of the corpus. State-of-the art NLP methods are used to build, pre-annotate and annotate the corpus and train language models. A community will be built around GeMTeX to ensure its sustainable maintenance, use, and dissemination.

Authors

  • Frank Meineke
    University of Leipzig, IMISE, Germany.
  • Luise Modersohn
    JULIE Lab, Friedrich Schiller University Jena, Germany.
  • Markus Loeffler
    IMISE, University of Leipzig, 04103 Leipzig, Germany.
  • Martin Boeker
    Institute for Medical Biometry and Statistics, Medical Center - University of Freiburg, Faculty of Medicine, Stefan-Meier-Str. 26, Freiburg i. Br., 79104, Germany. martin.boeker@uniklinik-freiburg.de.