EvidenceOutcomes: A Dataset of Clinical Trial Publications with Clinically Meaningful Outcomes.

Journal: Studies in health technology and informatics
Published Date:

Abstract

The fundamental process of evidence extraction in evidence-based medicine relies on identifying PICO elements, with Outcomes being the most complex and often overlooked. To address this, we introduce EvidenceOutcomes, a large annotated corpus of clinically meaningful outcomes. A robust annotation guideline was developed in collaboration with clinicians and NLP experts, and three annotators annotated the Results and Conclusions of 500 PubMed abstracts and 140 EBM-NLP abstracts, achieving an inter-rater agreement of 0.76. A fine-tuned PubMedBERT model achieved F1 scores of 0.69 (entity level) and 0.76 (token level). EvidenceOutcomes offers a benchmark for advancing machine learning algorithms in extracting clinically meaningful outcomes.

Authors

  • Yiliang Zhou
    Department of Population Health Sciences, Weill Cornell Medicine, New York, NY 10065, United States.
  • Abigail M Newbury
    Columbia University.
  • Gongbo Zhang
    Department of Biomedical Informatics, Columbia University Irving Medical Center, New York, New York, United States.
  • Betina Ross Idnay
    School of Nursing.
  • Hao Liu
    Key Laboratory of Development and Maternal and Child Diseases of Sichuan Province, Department of Pediatrics, Sichuan University, Chengdu, China.
  • Chunhua Weng
    Department of Biomedical Informatics, Columbia University.
  • Yifan Peng
    Department of Population Health Sciences, Weill Cornell Medicine, New York, USA.