Evaluation of Document Retrieval Systems on a Medical Corpus in French: Indexation vs. Feature Learning.

Journal: Studies in health technology and informatics
Published Date:

Abstract

This paper presents five document retrieval systems for a small (few thousands) and domain specific corpora (weekly peer-reviewed medical journals published in French) as well as an evaluation methodology to quantify the models performance. The proposed methodology does not rely on external annotations and therefore can be used as an ad hoc evaluation procedure for most document retrieval tasks. Statistical models and vector space models are empirically compared on a synthetic document retrieval task. For our dataset size and specificities the statistical approaches consistently performed better than its vector space counterparts.

Authors

  • Arnaud Robert
    Service des sciences de l'information médicale, HUG, 1211 Genève 14.
  • Francis Damachi
    Division of Medical Information Sciences, University Hospitals of Geneva and University of Geneva, Geneva, Switzerland.
  • Mina Bjelogrlic
    Division of Medical Information Sciences, University Hospitals of Geneva and University of Geneva, Geneva, Switzerland.
  • Jean-Philippe Goldman
    Division of Medical Information Sciences, University Hospitals of Geneva and University of Geneva, Geneva, Switzerland.
  • Christian Lovis
    Division of Medical Information Sciences Geneva University Hospitals and University of Geneva.