Improving broad-coverage medical entity linking with semantic type prediction and large-scale datasets.

Journal: Journal of biomedical informatics
Published Date:

Abstract

OBJECTIVES: Biomedical natural language processing tools are increasingly being applied for broad-coverage information extraction-extracting medical information of all types in a scientific document or a clinical note. In such broad-coverage settings, linking mentions of medical concepts to standardized vocabularies requires choosing the best candidate concepts from large inventories covering dozens of types. This study presents a novel semantic type prediction module for biomedical NLP pipelines and two automatically-constructed, large-scale datasets with broad coverage of semantic types.

Authors

  • Shikhar Vashishth
    Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, USA. Electronic address: shikharvashishth@gmail.com.
  • Denis Newman-Griffis
    Rehabilitation Medicine Department, National Institutes of Health Clinical Center, Bethesda, MD, United States.
  • Rishabh Joshi
    Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, USA.
  • Ritam Dutt
    Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, USA.
  • Carolyn P Rosé
    Carnegie Mellon University, 5000 Forbes Ave, Pittsburgh, PA, USA.