Limitations of Transformers on Clinical Text Classification.

Journal: IEEE journal of biomedical and health informatics
Published Date:

Abstract

Bidirectional Encoder Representations from Transformers (BERT) and BERT-based approaches are the current state-of-the-art in many natural language processing (NLP) tasks; however, their application to document classification on long clinical texts is limited. In this work, we introduce four methods to scale BERT, which by default can only handle input sequences up to approximately 400 words long, to perform document classification on clinical texts several thousand words long. We compare these methods against two much simpler architectures - a word-level convolutional neural network and a hierarchical self-attention network - and show that BERT often cannot beat these simpler baselines when classifying MIMIC-III discharge summaries and SEER cancer pathology reports. In our analysis, we show that two key components of BERT - pretraining and WordPiece tokenization - may actually be inhibiting BERT's performance on clinical text classification tasks where the input document is several thousand words long and where correctly identifying labels may depend more on identifying a few key words or phrases rather than understanding the contextual meaning of sequences of text.

Authors

  • Shang Gao
    Department of Orthopedics, Orthopedic Center of Chinese PLA, Southwest Hospital, Third Military Medical University, Chongqing, 400038, P.R.China.
  • Mohammed Alawad
    Computational Sciences and Engineering Division, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA.
  • M Todd Young
  • John Gounley
    Advanced Computing for Health Sciences, Oak Ridge National Laboratory, Oak Ridge, TN 37830, United States.
  • Noah Schaefferkoetter
    Oak Ridge National Lab, Oak Ridge, TN, USA.
  • Hong Jun Yoon
    Computational Sciences and Engineering Division, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA.
  • Xiao-Cheng Wu
    Department of Epidemiology, Louisiana State University New Orleans School of Public Health, New Orleans, LA 70112, United States.
  • Eric B Durbin
    University of Kentucky, Lexington, KY.
  • Jennifer Doherty
    Utah Cancer Registry, University of Utah School of Medicine, Salt Lake City, UT 84132, United States of America. Electronic address: Jen.Doherty@hci.utah.edu.
  • Antoinette Stroup
    New Jersey State Cancer Registry, Rutgers Cancer Institute of New Jersey, New Brunswick, NJ, 08901, United States of America. Electronic address: nan.stroup@rutgers.edu.
  • Linda Coyle
    Information Management Services Inc, Calverton, Maryland, USA.
  • Georgia Tourassi
    Computational Sciences and Engineering Division, Health Data Sciences Institute, Oak Ridge National Laboratory, Oak Ridge, Tennessee, USA.