Benchmarking for biomedical natural language processing tasks with a domain specific ALBERT.

Journal: BMC bioinformatics
Published Date:

Abstract

BACKGROUND: The abundance of biomedical text data coupled with advances in natural language processing (NLP) is resulting in novel biomedical NLP (BioNLP) applications. These NLP applications, or tasks, are reliant on the availability of domain-specific language models (LMs) that are trained on a massive amount of data. Most of the existing domain-specific LMs adopted bidirectional encoder representations from transformers (BERT) architecture which has limitations, and their generalizability is unproven as there is an absence of baseline results among common BioNLP tasks.

Authors

  • Usman Naseem
    School of Computer Science, The University of Sydney, Sydney, Australia. usman.naseem@sydney.edu.au.
  • Adam G Dunn
    Centre for Health Informatics, Australian Institute of Health Innovation, Macquarie University, NSW, Australia.
  • Matloob Khushi
    Children's Medical Research Institute, The University of Sydney, Westmead, NSW, Australia. mkhushi@uni.sydney.edu.au.
  • Jinman Kim
    School of Information Technologies, University of Sydney, Australia; Institute of Biomedical Engineering and Technology, University of Sydney, Australia. Electronic address: jinman.kim@sydney.edu.au.