CLIN-X: pre-trained language models and a study on cross-task transfer for concept extraction in the clinical domain.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: The field of natural language processing (NLP) has recently seen a large change toward using pre-trained language models for solving almost any task. Despite showing great improvements in benchmark datasets for various tasks, these models often perform sub-optimal in non-standard domains like the clinical domain where a large gap between pre-training documents and target documents is observed. In this article, we aim at closing this gap with domain-specific training of the language model and we investigate its effect on a diverse set of downstream tasks and settings.

Authors

  • Lukas Lange
    Bosch Center for Artificial Intelligence, 71272 Renningen, Germany.
  • Heike Adel
    Bosch Center for Artificial Intelligence, 71272 Renningen, Germany.
  • Jannik Strötgen
    Bosch Center for Artificial Intelligence, 71272 Renningen, Germany.
  • Dietrich Klakow
    Spoken Language Systems (LSV), Saarland Informatics Campus, Saarland University, Saarbrücken, Germany.