Few-Shot and Prompt Training for Text Classification in German Doctor's Letters.

Journal: Studies in health technology and informatics
Published Date:

Abstract

To classify sentences in cardiovascular German doctor's letters into eleven section categories, we used pattern-exploiting training, a prompt-based method for text classification in few-shot learning scenarios (20, 50 and 100 instances per class) using language models with various pre-training approaches evaluated on CARDIO:DE, a freely available German clinical routine corpus. Prompting improves results by 5-28% accuracy compared to traditional methods, reducing manual annotation efforts and computational costs in a clinical setting.

Authors

  • Phillip Richter-Pechanski
    Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology, Heidelberg.
  • Philipp Wiesenbach
    Klaus Tschira Institute for Computational Cardiology, Heidelberg, Germany.
  • Dominic M Schwab
    Department of Internal Medicine III, University Hospital Heidelberg, Germany.
  • Christina Kiriakou
    Department of Internal Medicine III, University Hospital Heidelberg, Germany.
  • Mingyang He
    Klaus Tschira Institute for Computational Cardiology, Heidelberg, Germany.
  • Nicolas A Geis
    Department of Internal Medicine III, University Hospital Heidelberg, Germany.
  • Anette Frank
    Department of Computational Linguistics, Heidelberg University, Germany.
  • Christoph Dieterich
    Section of Bioinformatics and Systems Cardiology, Klaus Tschira Institute for Integrative Computational Cardiology, Heidelberg.