Use of deep learning-based NLP models for full-text data elements extraction for systematic literature review tasks.

Journal: Scientific reports
Published Date:

Abstract

Systematic literature review (SLR) is an important tool for Health Economics and Outcomes Research (HEOR) evidence synthesis. SLRs involve the identification and selection of pertinent publications and extraction of relevant data elements from full-text articles, which can be a manually intensive procedure. Previously we developed machine learning models to automatically identify relevant publications based on pre-specified inclusion and exclusion criteria. This study investigates the feasibility of applying Natural Language Processing (NLP) approaches to automatically extract data elements from the relevant scientific literature. First, 239 full-text articles were collected and annotated for 12 important variables including study cohort, lab technique, and disease type, for proper SLR summary of Human papillomavirus (HPV) Prevalence, Pneumococcal Epidemiology, and Pneumococcal Economic Burden. The three resulting annotated corpora are shared publicly at [ https://github.com/Merck/NLP-SLR-corpora ], to provide training data and a benchmark baseline for the NLP community to further research this challenging task. We then compared three classic Named Entity Recognition (NER) algorithms, namely Conditional Random Fields (CRF), Long Short-Term Memory (LSTM), and the Bidirectional Encoder Representations from Transformers (BERT) models, to assess performance on the data element extraction task. The annotation corpora contain 4,498, 579, and 252 annotated entity mentions for HPV Prevalence, Pneumococcal Epidemiology, and Pneumococcal Economic Burden tasks respectively. Deep learning algorithms achieved superior performance in recognizing the targeted SLR data elements, compared to conventional machine learning algorithms. LSTM models have achieved 0.890, 0.646 and 0.615 micro-averaged F1 scores for three tasks respectively. CRF models could not provide comparable performance on most of the elements of interest. Although BERT-based models are known to generally achieve superior performance on many NLP tasks, we did not observe improvement in our three tasks. Deep learning algorithms have achieved superior performance compared with machine learning models on multiple SLR data element extraction tasks. LSTM model, in particular, is more preferable for deployment in supporting HEOR SLR data element extraction, due to its better performance, generalizability, and scalability as it's cost-effective in our SLR benchmark datasets.

Authors

  • Jingcheng Du
    University of Texas Health Science Center at Houston, Houston, Texas, USA.
  • Dong Wang
    Department of Neurosurgery, Tianjin Medical University General Hospital, Tianjin, China.
  • Bin Lin
    Department of Biostatistics, Hospital for Special Surgery, 535 E 70(th) Street, New York, NY 10021, United States of America.
  • Long He
    IMO Health, Inc., Rosemont, IL 60018, United States.
  • Liang-Chin Huang
    School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.
  • Jingqi Wang
    School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.
  • Frank J Manion
    School of Biomedical Informatics, University of Texas Health Science Center, Houston, Texas, United States.
  • Yeran Li
    Harvard T.H. Chan School of Public Health, Cambridge, MA, USA.
  • Nicole Cossrow
    Merck & Co., Inc., Rahway, NJ, USA.
  • Lixia Yao
    Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA.