Deep learning with sentence embeddings pre-trained on biomedical corpora improves the performance of finding similar sentences in electronic medical records.

Journal: BMC medical informatics and decision making
Published Date:

Abstract

BACKGROUND: Capturing sentence semantics plays a vital role in a range of text mining applications. Despite continuous efforts on the development of related datasets and models in the general domain, both datasets and models are limited in biomedical and clinical domains. The BioCreative/OHNLP2018 organizers have made the first attempt to annotate 1068 sentence pairs from clinical notes and have called for a community effort to tackle the Semantic Textual Similarity (BioCreative/OHNLP STS) challenge.

Authors

  • Qingyu Chen
    Department of Biomedical Informatics and Data Science, Yale School of Medicine, Yale University, New Haven, CT, USA.
  • Jingcheng Du
    University of Texas Health Science Center at Houston, Houston, Texas, USA.
  • Sun Kim
    National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, 20894, MD, USA. sun.kim@nih.gov.
  • W John Wilbur
    National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, 20894, MD, USA. wilbur@ncbi.nlm.nih.gov.
  • Zhiyong Lu
    National Center for Biotechnology Information, Bethesda, MD 20894 USA.