Performance of Machine Learning Methods to Classify French Medical Publications.

Journal: Studies in health technology and informatics
Published Date:

Abstract

Many medical narratives are read by care professionals in their preferred language. These documents can be produced by organizations, authorities or national publishers. However, they are often hardly findable using the usual query engines based on English such as PubMed. This work explores the possibility to automatically categorize medical documents in French following an automatic Natural Language Processing pipeline. The pipeline is used to compare the performance of 6 different machine learning and deep neural network approaches on a large dataset of peer-reviewed weekly published Swiss medical journal in French covering major topics in medicine over the last 15 years. An accuracy of 96% was achieved for 5-topic classification and 81% for 20-topic classification.

Authors

  • Jamil Zaghir
    Division of Medical Information Sciences, University Hospitals of Geneva.
  • Jean-Philippe Goldman
    Division of Medical Information Sciences, University Hospitals of Geneva and University of Geneva, Geneva, Switzerland.
  • Mina Bjelogrlic
    Division of Medical Information Sciences, University Hospitals of Geneva and University of Geneva, Geneva, Switzerland.
  • Daniel Keszthelyi
    Division of Medical Information Sciences, University Hospitals of Geneva.
  • Christophe Gaudet-Blavignac
    Division of Medical Information Sciences Geneva University Hospitals and University of Geneva.
  • Hugues TurbĂ©
    Division of Medical Information Sciences, University Hospitals of Geneva.
  • Belinda Lokaj
    Division of Medical Information Sciences, University Hospitals of Geneva.
  • Christian Lovis
    Division of Medical Information Sciences Geneva University Hospitals and University of Geneva.