An annotated corpus of clinical trial publications supporting schema-based relational information extraction.
Journal:
Journal of biomedical semantics
Published Date:
May 23, 2022
Abstract
BACKGROUND: The evidence-based medicine paradigm requires the ability to aggregate and compare outcomes of interventions across different trials. This can be facilitated and partially automatized by information extraction systems. In order to support the development of systems that can extract information from published clinical trials at a fine-grained and comprehensive level to populate a knowledge base, we present a richly annotated corpus at two levels. At the first level, entities that describe components of the PICO elements (e.g., population's age and pre-conditions, dosage of a treatment, etc.) are annotated. The second level comprises schema-level (i.e., slot-filling templates) annotations corresponding to complex PICO elements and other concepts related to a clinical trial (e.g. the relation between an intervention and an arm, the relation between an outcome and an intervention, etc.).