Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data.

Journal: Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

Published Date: Jan 1, 2020

Abstract

Word embeddings are a popular approach to unsupervised learning of word relationships that are widely used in natural language processing. In this article, we present a new set of embeddings for medical concepts learned using an extremely large collection of multimodal medical data. Leaning on recent theoretical insights, we demonstrate how an insurance claims database of 60 million members, a collection of 20 million clinical notes, and 1.7 million full text biomedical journal articles can be combined to embed concepts into a common space, resulting in the largest ever set of embeddings for 108,477 medical concepts. To evaluate our approach, we present a new benchmark methodology based on statistical power specifically designed to test embeddings of medical concepts. Our approach, called cui2vec, attains state-of-the-art performance relative to previous methods in most instances. Finally, we provide a downloadable set of pre-trained embeddings for other researchers to use, as well as an online tool for interactive exploration of the cui2vec embeddings.

Authors

Andrew L Beam

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA.
Benjamin Kompa
Allen Schmaltz
Inbar Fried
Griffin Weber
Nathan Palmer
Xu Shi

Research Institute for Electronic Science, Hokkaido University, Hokkaido 060-0808, Japan.
Tianxi Cai

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, United States.
Isaac S Kohane

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, USA. Isaac_Kohane@hms.harvard.edu.

Keywords

Computational Biology Databases, Factual Humans Natural Language Processing

External Resources

View on PubMed PubMed (31797605)

Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals