Knowledge Extraction from MEDLINE by Combining Clustering with Natural Language Processing.

Journal: AMIA ... Annual Symposium proceedings. AMIA Symposium

Published Date: Nov 5, 2015

Abstract

The identification of relevant predicates between co-occurring concepts in scientific literature databases like MEDLINE is crucial for using these sources for knowledge extraction, in order to obtain meaningful biomedical predications as subject-predicate-object triples. We consider the manually assigned MeSH indexing terms (main headings and subheadings) in MEDLINE records as a rich resource for extracting a broad range of domain knowledge. In this paper, we explore the combination of a clustering method for co-occurring concepts based on their related MeSH subheadings in MEDLINE with the use of SemRep, a natural language processing engine, which extracts predications from free text documents. As a result, we generated sets of clusters of co-occurring concepts and identified the most significant predicates for each cluster. The association of such predicates with the co-occurrences of the resulting clusters produces the list of predications, which were checked for relevance.

Authors

Jose A Miñarro-Giménez

Institute of Medical Informatics, Statistics, and Documentation, Medical University of Graz, Austria.
Markus Kreuzthaler

Institute of Medical Informatics, Statistics, and Documentation, Medical University of Graz, Austria.
Stefan Schulz

Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.

Keywords

Cluster Analysis Databases, Factual Humans Information Storage and Retrieval Medical Subject Headings MEDLINE Natural Language Processing Semantics

External Resources

View on PubMed PubMed (26958228)

Knowledge Extraction from MEDLINE by Combining Clustering with Natural Language Processing.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals