Comparison of MetaMap, cTAKES, SIFR, and ECMT to Annotate Breast Cancer Patient Summaries.

Journal: Studies in health technology and informatics
Published Date:

Abstract

Most clinical texts including breast cancer patient summaries (BCPSs) are elaborated as narrative documents difficult to process by decision support systems. Annotators have been developed to extract the relevant content of such documents, e.g., MetaMap and cTAKES, that work with the English language and perform concept mapping using UMLS, SIFR and ECMT, that work for the French language and provide concepts using various terminologies. We compared the four annotators on a sample of 25 French BCPSs, pre-processed to manage acronyms and translated in English. We observed that MetaMap extracted the largest number of UMLS concepts (15,458), followed by SIFR (3,784), ECMT (1,962), and cTAKES (1,769). Each annotator extracted specific valuable information, not proposed by the other annotators. Considered as complementary, all annotators should be used in sequence to optimize the results.

Authors

  • Akram Redjdal
    Sorbonne Université, Université Sorbonne Paris Nord, Inserm, UMR S_1142, LIMICS, Paris, France.
  • Jacques Bouaud
    AP-HP, DRCD, Paris, France.
  • Joseph Gligorov
    Sorbonne Université, Institut Universitaire de Cancérologie, Paris, France.
  • Brigitte Seroussi
    Sorbonne Universités, UPMC Université Paris 06, UMR_S 1142, LIMICS, Paris, France.