When BERT meets Bilbo: a learning curve analysis of pretrained language model on disease classification.

Journal: BMC medical informatics and decision making

Published Date: Apr 5, 2022

Abstract

BACKGROUND: Natural language processing (NLP) tasks in the health domain often deal with limited amount of labeled data due to high annotation costs and naturally rare observations. To compensate for the lack of training data, health NLP researchers often have to leverage knowledge and resources external to a task at hand. Recently, pretrained large-scale language models such as the Bidirectional Encoder Representations from Transformers (BERT) have been proven to be a powerful way of learning rich linguistic knowledge from massive unlabeled text and transferring that knowledge to downstream tasks. However, previous downstream tasks often used training data at such a large scale that is unlikely to obtain in the health domain. In this work, we aim to study whether BERT can still benefit downstream tasks when training data are relatively small in the context of health NLP.

Authors

Xuedong Li

College of Computer Science, Sichuan University, Chengdu, China.
Walter Yuan

MobLab Inc., Pasadena, CA, United States.
Dezhong Peng

College of Computer Science, Sichuan University, Chengdu, China.
Qiaozhu Mei

University of Michigan, Ann Arbor, MI.
Yue Wang

Department of Pharmacology, University of North Carolina at Chapel Hill, Chapel Hill, NC, United States.

Keywords

Humans Language Learning Curve Natural Language Processing

External Resources

View on PubMed Access via DOI PubMed (35382811)

When BERT meets Bilbo: a learning curve analysis of pretrained language model on disease classification.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals