Information Extraction from Medical Texts with BERT Using Human-in-the-Loop Labeling.

Journal: Studies in health technology and informatics
Published Date:

Abstract

Neural network language models, such as BERT, can be used for information extraction from medical texts with unstructured free text. These models can be pre-trained on a large corpus to learn the language and characteristics of the relevant domain and then fine-tuned with labeled data for a specific task. We propose a pipeline using human-in-the-loop labeling to create annotated data for Estonian healthcare information extraction. This method is particularly useful for low-resource languages and is more accessible to those in the medical field than rule-based methods like regular expressions.

Authors

  • Hendrik Ĺ uvalov
    University of Tartu, Estonia.
  • Sven Laur
    University of Tartu, Estonia.
  • Raivo Kolde
    University of Tartu, Estonia.