Use of deep learning-based NLP models for full-text data elements extraction for systematic literature review tasks.

Journal: Scientific reports

Published Date: Jun 3, 2025

Abstract

Systematic literature review (SLR) is an important tool for Health Economics and Outcomes Research (HEOR) evidence synthesis. SLRs involve the identification and selection of pertinent publications and extraction of relevant data elements from full-text articles, which can be a manually intensive procedure. Previously we developed machine learning models to automatically identify relevant publications based on pre-specified inclusion and exclusion criteria. This study investigates the feasibility of applying Natural Language Processing (NLP) approaches to automatically extract data elements from the relevant scientific literature. First, 239 full-text articles were collected and annotated for 12 important variables including study cohort, lab technique, and disease type, for proper SLR summary of Human papillomavirus (HPV) Prevalence, Pneumococcal Epidemiology, and Pneumococcal Economic Burden. The three resulting annotated corpora are shared publicly at [ https://github.com/Merck/NLP-SLR-corpora ], to provide training data and a benchmark baseline for the NLP community to further research this challenging task. We then compared three classic Named Entity Recognition (NER) algorithms, namely Conditional Random Fields (CRF), Long Short-Term Memory (LSTM), and the Bidirectional Encoder Representations from Transformers (BERT) models, to assess performance on the data element extraction task. The annotation corpora contain 4,498, 579, and 252 annotated entity mentions for HPV Prevalence, Pneumococcal Epidemiology, and Pneumococcal Economic Burden tasks respectively. Deep learning algorithms achieved superior performance in recognizing the targeted SLR data elements, compared to conventional machine learning algorithms. LSTM models have achieved 0.890, 0.646 and 0.615 micro-averaged F1 scores for three tasks respectively. CRF models could not provide comparable performance on most of the elements of interest. Although BERT-based models are known to generally achieve superior performance on many NLP tasks, we did not observe improvement in our three tasks. Deep learning algorithms have achieved superior performance compared with machine learning models on multiple SLR data element extraction tasks. LSTM model, in particular, is more preferable for deployment in supporting HEOR SLR data element extraction, due to its better performance, generalizability, and scalability as it's cost-effective in our SLR benchmark datasets.

Authors

Jingcheng Du

University of Texas Health Science Center at Houston, Houston, Texas, USA.
Dong Wang

Department of Neurosurgery, Tianjin Medical University General Hospital, Tianjin, China.
Bin Lin

Department of Biostatistics, Hospital for Special Surgery, 535 E 70(th) Street, New York, NY 10021, United States of America.
Long He

IMO Health, Inc., Rosemont, IL 60018, United States.
Liang-Chin Huang

School of Biomedical Informatics, University of Texas Health Science Center at Houston, Houston, TX, USA.
Jingqi Wang

School of Biomedical Informatics, The University of Texas Health Science Center at Houston, Houston, TX, USA.
Frank J Manion

School of Biomedical Informatics, University of Texas Health Science Center, Houston, Texas, United States.
Yeran Li

Harvard T.H. Chan School of Public Health, Cambridge, MA, USA.
Nicole Cossrow

Merck & Co., Inc., Rahway, NJ, USA.
Lixia Yao

Department of Health Sciences Research, Mayo Clinic, Rochester, MN 55905, USA.

Keywords

Algorithms Data Mining Deep Learning Humans Natural Language Processing Systematic Reviews as Topic

External Resources

View on PubMed Access via DOI PubMed (40461545)

Use of deep learning-based NLP models for full-text data elements extraction for systematic literature review tasks.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals