Machine learning for identifying Randomized Controlled Trials: An evaluation and practitioner's guide.

Journal: Research synthesis methods

Published Date: Dec 1, 2018

Abstract

Machine learning (ML) algorithms have proven highly accurate for identifying Randomized Controlled Trials (RCTs) but are not used much in practice, in part because the best way to make use of the technology in a typical workflow is unclear. In this work, we evaluate ML models for RCT classification (support vector machines, convolutional neural networks, and ensemble approaches). We trained and optimized support vector machine and convolutional neural network models on the titles and abstracts of the Cochrane Crowd RCT set. We evaluated the models on an external dataset (Clinical Hedges), allowing direct comparison with traditional database search filters. We estimated area under receiver operating characteristics (AUROC) using the Clinical Hedges dataset. We demonstrate that ML approaches better discriminate between RCTs and non-RCTs than widely used traditional database search filters at all sensitivity levels; our best-performing model also achieved the best results to date for ML in this task (AUROC 0.987, 95% CI, 0.984-0.989). We provide practical guidance on the role of ML in (1) systematic reviews (high-sensitivity strategies) and (2) rapid reviews and clinical question answering (high-precision strategies) together with recommended probability cutoffs for each use case. Finally, we provide open-source software to enable these approaches to be used in practice.

Authors

Iain J Marshall

Department of Primary Care and Public Health Sciences, King's College London, UK iain.marshall@kcl.ac.uk.
Anna Noel-Storr

Cochrane Dementia and Cognitive Improvement Group University of Oxford United Kingdom.
Joel Kuiper

Department of Genetics, Genomics Coordination Center, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands Department of Epidemiology, University of Groningen, University Medical Center Groningen, Groningen, The Netherlands.
James Thomas

EPPI-Centre, Social Research Institute, University College London, London, England, UK.
Byron C Wallace

School of Information, University of Texas at Austin, Austin, Texas, USA.

Keywords

Algorithms Databases, Bibliographic Evidence-Based Medicine Humans Information Storage and Retrieval Machine Learning Randomized Controlled Trials as Topic Registries Reproducibility of Results Review Literature as Topic ROC Curve Search Engine Sensitivity and Specificity Subject Headings Support Vector Machine

External Resources

View on PubMed Access via DOI PubMed (29314757)

Machine learning for identifying Randomized Controlled Trials: An evaluation and practitioner's guide.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals