NLP-Assisted Pipeline for COVID-19 Core Outcome Set Identification Using ClinicalTrials.gov.

Journal: Studies in health technology and informatics
PMID:

Abstract

Core outcome sets (COS) are necessary to ensure the systematic collection, metadata analysis and sharing the information across studies. However, development of an area-specific clinical research is costly and time consuming. ClinicalTrials.gov, as a public repository, provides access to a vast collection of clinical trials and their characteristics such as primary outcomes. With the growing number of COVID-19 clinical trials, identifying COSs from outcomes of such trials is crucial. This paper introduces a semi-automatic pipeline that can efficiently identify, aggregate and rank the COS from the primary outcomes of COVID-19 clinical trials. Using Natural language processing (NLP) techniques, our proposed pipeline successfully downloads and processes 5090 trials from all over the world and identifies COVID-19-specific outcomes that appeared in more than 1% of the trials. The top-of-the-list outcomes identified by the pipeline are mortality due to COVID-19, COVID-19 infection rate and COVID-19 symptoms.

Authors

  • Fatemeh Shah-Mohammadi
    Department of Biomedical Informatics, School of Medicine, University of Utah, USA.
  • Irena Parvanova
    Center for Biomedical and Population Health Informatics, Icahn School of Medicine at Mount Sinai, New York, NY, USA.
  • Joseph Finkelstein
    Department of Biomedical Informatics, School of Medicine, University of Utah, USA.