When BERT meets Bilbo: a learning curve analysis of pretrained language model on disease classification.

Journal: BMC medical informatics and decision making
Published Date:

Abstract

BACKGROUND: Natural language processing (NLP) tasks in the health domain often deal with limited amount of labeled data due to high annotation costs and naturally rare observations. To compensate for the lack of training data, health NLP researchers often have to leverage knowledge and resources external to a task at hand. Recently, pretrained large-scale language models such as the Bidirectional Encoder Representations from Transformers (BERT) have been proven to be a powerful way of learning rich linguistic knowledge from massive unlabeled text and transferring that knowledge to downstream tasks. However, previous downstream tasks often used training data at such a large scale that is unlikely to obtain in the health domain. In this work, we aim to study whether BERT can still benefit downstream tasks when training data are relatively small in the context of health NLP.

Authors

  • Xuedong Li
    College of Computer Science, Sichuan University, Chengdu, China.
  • Walter Yuan
    MobLab Inc., Pasadena, CA, United States.
  • Dezhong Peng
    College of Computer Science, Sichuan University, Chengdu, China.
  • Qiaozhu Mei
    University of Michigan, Ann Arbor, MI.
  • Yue Wang
    Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.