SECNLP: A survey of embeddings in clinical natural language processing.

Journal: Journal of biomedical informatics
Published Date:

Abstract

Distributed vector representations or embeddings map variable length text to dense fixed length vectors as well as capture prior knowledge which can transferred to downstream tasks. Even though embeddings have become de facto standard for text representation in deep learning based NLP tasks in both general and clinical domains, there is no survey paper which presents a detailed review of embeddings in Clinical Natural Language Processing. In this survey paper, we discuss various medical corpora and their characteristics, medical codes and present a brief overview as well as comparison of popular embeddings models. We classify clinical embeddings and discuss each embedding type in detail. We discuss various evaluation methods followed by possible solutions to various challenges in clinical embeddings. Finally, we conclude with some of the future directions which will advance research in clinical embeddings.

Authors

  • Katikapalli Subramanyam Kalyan
    Text Analytics and NLP Lab, Department of Computer Applications, NIT Trichy, India. Electronic address: kalyan.ks@yahoo.com.
  • S Sangeetha
    Text Analytics and NLP Lab, Department of Computer Applications, NIT Trichy, India. Electronic address: sangeetha@nitt.edu.