Using language models and ontology topology to perform semantic mapping of traits between biomedical datasets.

Journal: Bioinformatics (Oxford, England)
Published Date:

Abstract

MOTIVATION: Human traits are typically represented in both the biomedical literature and large population studies as descriptive text strings. Whilst a number of ontologies exist, none of these perfectly represent the entire human phenome and exposome. Mapping trait names across large datasets is therefore time-consuming and challenging. Recent developments in language modelling have created new methods for semantic representation of words and phrases, and these methods offer new opportunities to map human trait names in the form of words and short phrases, both to ontologies and to each other. Here, we present a comparison between a range of established and more recent language modelling approaches for the task of mapping trait names from UK Biobank to the Experimental Factor Ontology (EFO), and also explore how they compare to each other in direct trait-to-trait mapping.

Authors

  • Yi Liu
    Department of Interventional Therapy, Ningbo No. 2 Hospital, Ningbo, China.
  • Benjamin L Elsworth
    Our Future Health, Manchester, United Kingdom.
  • Tom R Gaunt
    MRC Integrative Epidemiology Unit, School of Social and Community Medicine, University of Bristol, Bristol BS82BN, UK.