M-PreSS: A Model Pre-training Approach for Study Screening in Systematic Reviews

Journal: medRxiv
Published Date:

Abstract

Conducting a systematic review is labour-intensive and time-consuming, especially during the study screening process. Previous research has introduced traditional machine learning models (e.g. Support Vector Machines) to automate the study screening process, but it is difficult to generalise across topics. Therefore, recent research has explored the use of existing large language models (LLMs), such as ChatGPT/GPT-4, for study screening. However, the lack of transparency in training data and consistency in output results make applying such commercial LLMs challenging in the context of systematic reviews where transparency in methods is particularly important. We introduce an approach to fine-tune an open-source biomedical language model (BlueBERT) using a Siamese neural network [1] so that it screens the scientific literature databases on multiple research topics. We evaluate different training approaches in seven COVID-19 systematic reviews. The results indicate good generalisation among topics with an average recall/sensitivity of 0.86 (minimum: 0.67, maximum: 1.00) and an average false positive rate of 6.48% (minimum: 1.38%, maximum: 11.41%). Furthermore, adding study selection criteria to the topic definition can improve the model performance (Area Under the Precision-Recall Curve [PRAUC]) by 2.74%, and adding more related review topics during training can increase the performance by 15.82%. Our results indicate that fine-tuning BlueBERT with study screening datasets can outperform ChatGPT/GPT-4 in two out of three COVID-19 review topics reported in the literature, whilst maintaining the ability for researchers to continue updating or extending the search for related evidence and significantly reducing the computational resource requirements.

Authors

  • Zhaozhen Xu; Philippa Davies; Louise AC Millard; Lam Teng; Georgios Markozannes; Pau Erola; Eduardo AP Seleiro; Julian PT Higgins; Richard M Martin; Maria Sobczyk-Barad; Konstantinos K Tsilidis; Doris SM Chan; Tom R Gaunt; Yi Liu