Acquisition of Character Translation Rules for Supporting SNOMED CT Localizations.

Journal: Studies in health technology and informatics
PMID:

Abstract

Translating huge medical terminologies like SNOMED CT is costly and time consuming. We present a methodology that acquires substring substitution rules for single words, based on the known similarity between medical words and their translations, due to their common Latin / Greek origin. Character translation rules are automatically acquired from pairs of English words and their automated translations to German. Using a training set with single words extracted from SNOMED CT as input we obtained a list of 268 translation rules. The evaluation of these rules improved the translation of 60% of words compared to Google Translate and 55% of translated words that exactly match the right translations. On a subset of words where machine translation had failed, our method improves translation in 56% of cases, with 27% exactly matching the gold standard.

Authors

  • Jose Antonio Miñarro-Giménez
    Institute of Medical Informatics, Statistics, and Documentation, Medical University of Graz, Austria.
  • Johannes Hellrich
    Jena University Language & Information Engineering (JULIE) Lab Friedrich-Schiller-Universität Jena, Jena, Germany.
  • Stefan Schulz
    Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.