Using language models and ontology topology to perform semantic mapping of traits between biomedical datasets.

Journal: Bioinformatics (Oxford, England)

Published Date: Apr 3, 2023

Abstract

MOTIVATION: Human traits are typically represented in both the biomedical literature and large population studies as descriptive text strings. Whilst a number of ontologies exist, none of these perfectly represent the entire human phenome and exposome. Mapping trait names across large datasets is therefore time-consuming and challenging. Recent developments in language modelling have created new methods for semantic representation of words and phrases, and these methods offer new opportunities to map human trait names in the form of words and short phrases, both to ontologies and to each other. Here, we present a comparison between a range of established and more recent language modelling approaches for the task of mapping trait names from UK Biobank to the Experimental Factor Ontology (EFO), and also explore how they compare to each other in direct trait-to-trait mapping.

Authors

Yi Liu

Department of Interventional Therapy, Ningbo No. 2 Hospital, Ningbo, China.
Benjamin L Elsworth

Our Future Health, Manchester, United Kingdom.
Tom R Gaunt

MRC Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Bristol BS82BN, UK.

Keywords

Biological Ontologies Humans Language Natural Language Processing Phenotype Semantics

External Resources

View on PubMed Access via DOI PubMed (37010521)

Using language models and ontology topology to perform semantic mapping of traits between biomedical datasets.

Abstract

Authors

Keywords

External Resources

Popular Topics

Recent Journals