From lexical regularities to axiomatic patterns for the quality assurance of biomedical terminologies and ontologies.

Journal: Journal of biomedical informatics
Published Date:

Abstract

Ontologies and terminologies have been identified as key resources for the achievement of semantic interoperability in biomedical domains. The development of ontologies is performed as a joint work by domain experts and knowledge engineers. The maintenance and auditing of these resources is also the responsibility of such experts, and this is usually a time-consuming, mostly manual task. Manual auditing is impractical and ineffective for most biomedical ontologies, especially for larger ones. An example is SNOMED CT, a key resource in many countries for codifying medical information. SNOMED CT contains more than 300000 concepts. Consequently its auditing requires the support of automatic methods. Many biomedical ontologies contain natural language content for humans and logical axioms for machines. The 'lexically suggest, logically define' principle means that there should be a relation between what is expressed in natural language and as logical axioms, and that such a relation should be useful for auditing and quality assurance. Besides, the meaning of this principle is that the natural language content for humans could be used to generate the logical axioms for the machines. In this work, we propose a method that combines lexical analysis and clustering techniques to (1) identify regularities in the natural language content of ontologies; (2) cluster, by similarity, labels exhibiting a regularity; (3) extract relevant information from those clusters; and (4) propose logical axioms for each cluster with the support of axiom templates. These logical axioms can then be evaluated with the existing axioms in the ontology to check their correctness and completeness, which are two fundamental objectives in auditing and quality assurance. In this paper, we describe the application of the method to two SNOMED CT modules, a 'congenital' module, obtained using concepts exhibiting the attribute Occurrence - Congenital, and a 'chronic' module, using concepts exhibiting the attribute Clinical course - Chronic. We obtained a precision and a recall of respectively 75% and 28% for the 'congenital' module, and 64% and 40% for the 'chronic' one. We consider these results to be promising, so our method can contribute to the support of content editors by using automatic methods for assuring the quality of biomedical ontologies and terminologies.

Authors

  • Philip van Damme
    Department of Medical Informatics, Amsterdam Public Health research institute, Academic Medical Center, University of Amsterdam, The Netherlands. Electronic address: philip.vandamme@student.uva.nl.
  • Manuel Quesada-Martínez
    Facultad de Informática, Campus de Espinardo, Universidad de Murcia, 30100 Murcia, Spain. Electronic address: manuel.quesada@um.es.
  • Ronald Cornet
    Department of Medical Informatics, Academic Medical Center, University of Amsterdam, Meibergdreef 15, 1105 AZ Amsterdam, The Netherlands; Department of Biomedical Engineering, Linköping University, SE-581 83 Linköping, Sweden.
  • Jesualdo Tomás Fernández-Breis
    Faculty of Computer Science, Universidad de Murcia, IMIB-Arrixaca, Spain.