Automated learning of domain taxonomies from text using background knowledge.

Journal: Journal of biomedical informatics

Published Date: Sep 3, 2016

Abstract

In this paper, we present an automated method for taxonomy learning, focusing on concept formation and hierarchical relation learning. To infer such relations, we partition the extracted concepts and group them into closely-related clusters using Hierarchical Agglomerative Clustering, informed by syntactic matching and semantic relatedness functions. We introduce a novel, unsupervised method for cluster detection based on automated dendrogram pruning, which is dynamic to each partition. We evaluate our approach with two different types of textual corpora, clinical trials descriptions and MEDLINE publication abstracts. The results of several experiments indicate that our method is superior to existing dynamic pruning and the state-of-art taxonomy learning methods. It yields higher concept coverage (95.75%) and higher accuracy of learned taxonomic relations (up to 0.71 average precision and 0.96 average recall).

Authors

Julia Hoxha

Department of Biomedical Informatics, Columbia University, New York, NY, USA.
Guoqian Jiang

Mayo Clinic College of Medicine, Rochester, MN, USA.
Chunhua Weng

Department of Biomedical Informatics, Columbia University.

Keywords

Cluster Analysis Electronic Data Processing Humans Knowledge MEDLINE Semantics Unsupervised Machine Learning

External Resources

View on PubMed Access via DOI PubMed (27597572)

Automated learning of domain taxonomies from text using background knowledge.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals