Bilingual term alignment from comparable corpora in English discharge summary and Chinese discharge summary.
Journal:
BMC bioinformatics
Published Date:
May 9, 2015
Abstract
BACKGROUND: Electronic medical record (EMR) systems have become widely used throughout the world to improve the quality of healthcare and the efficiency of hospital services. A bilingual medical lexicon of Chinese and English is needed to meet the demand for the multi-lingual and multi-national treatment. We make efforts to extract a bilingual lexicon from English and Chinese discharge summaries with a small seed lexicon. The lexical terms can be classified into two categories: single-word terms (SWTs) and multi-word terms (MWTs). For SWTs, we use a label propagation (LP; context-based) method to extract candidates of translation pairs. For MWTs, which are pervasive in the medical domain, we propose a term alignment method, which firstly obtains translation candidates for each component word of a Chinese MWT, and then generates their combinations, from which the system selects a set of plausible translation candidates.