GeMTeX's De-Identification in Action: Lessons Learned & Devil's Details.

Journal: Studies in health technology and informatics
Published Date:

Abstract

INTRODUCTION: In 2024, the GeMTeX project launched the largest ever de-identification campaign for German-language clinical reports, and, as a pilot study, published GraSCCoPHI, the first de-identified German-language gold standard corpus of synthetic discharge summaries.

Authors

  • Christina Lohr
    Jena University Language & Information Engineering (JULIE) Lab Friedrich-Schiller-Universität Jena, Germany, http://www.julielab.de, Udo.Hahn@uni-jena.de, Franz.Matthies@uni-jena.de, Christina.Lohr@uni-jena.de.
  • Jakob Faller
    Medical Center for Information and Communication Technology, Universitätsklinikum Erlangen, Friedrich-Alexander-Universität Erlangen-Nürnberg, Germany.
  • Andrea Riedel
    Erlangen University Hospital, Medical Center for Information and Communication Technology, Erlangen, Germany; Friedrich-Alexander-Universität Erlangen-Nürnberg, Medical Informatics, Erlangen, Germany.
  • Hung Manh Nguyen
    Institute of Biology, Vietnam Academy of Science and Technology, 18 Hoang Quoc Viet Street, Hanoi, 100000, Vietnam. hung_iebr@yahoo.com.
  • Markus Wolfien
    Department of Systems Biology and Bioinformatics, Institute of Computer Science, University of Rostock, Rostock, Germany.
  • Justin Hofenbitzer
    Institute of AI and Informatics in Medicine (AIIM), TUM University Hospital, Technical University of Munich, Munich, Germany.
  • Luise Modersohn
    JULIE Lab, Friedrich Schiller University Jena, Germany.
  • Jutta Romberg
    Data Integration Center, Berlin Institute of Health (BIH) at Charité, Berlin, Germany.
  • Fabian Prasser
    Berlin Institute of Health (BIH), Berlin, Germany.
  • Jazia Omeirat
    Central IT Department, Data Integration Center, University Hospital Essen, Essen, Germany.
  • Yutong Wen
    Data Integration Center, Central IT Department, University Hospital Essen, Essen, Germany.
  • Oksana Galusch
    Data Integration Center, University of Leipzig Medical Center, Leipzig, Germany.
  • Udo Hahn
    Jena University Language & Information Engineering (JULIE) Lab Friedrich-Schiller-Universität Jena, Jena, Germany.
  • Marvin Seiferling
    Klaus Tschira Institute for Integrative Computational Cardiology, University Hospital Heidelberg, Heidelberg, Germany.
  • Christoph Dieterich
    Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology, Heidelberg.
  • Peter Klügl
    Averbis GmbH, Freiburg, Germany.
  • Franz Matthies
    Jena University Language & Information Engineering (JULIE) Lab Friedrich-Schiller-Universität Jena, Jena, Germany.
  • Janina Kind
    Leipziger Forschungszentrum für Zivilisationserkrankungen - LIFE Management Cluster, Leipzig, Leipzig University, Germany.
  • Martin Boeker
    Institute for Medical Biometry and Statistics, Medical Center - University of Freiburg, Faculty of Medicine, Stefan-Meier-Str. 26, Freiburg i. Br., 79104, Germany. martin.boeker@uniklinik-freiburg.de.
  • Markus Löffler
    Institute for Medical Informatics, Statistics and Epidemiology (IMISE) Universität Leipzig, Germany Markus.Loffler@imise.uni-leipzig.de.
  • Frank Meineke
    University of Leipzig, IMISE, Germany.