Word Embedding for French Natural Language in Healthcare: A Comparative Study.

Journal: Studies in health technology and informatics
Published Date:

Abstract

Structuring raw medical documents with ontology mapping is now the next step for medical intelligence. Deep learning models take as input mathematically embedded information, such as encoded texts. To do so, word embedding methods can represent every word from a text as a fixed-length vector. A formal evaluation of three word embedding methods has been performed on raw medical documents. The data corresponds to more than 12M diverse documents produced in the Rouen hospital (drug prescriptions, discharge and surgery summaries, inter-services letters, etc.). Automatic and manual validation demonstrates that Word2Vec based on the skip-gram architecture had the best rate on three out of four accuracy tests. This model will now be used as the first layer of an AI-based semantic annotator.

Authors

  • Emeric Dynomant
    OmicX, 72 Rue de la République, 76140, Le Petit Quevilly, Normandie, France.
  • Romain Lelong
    CISMeF & TIBS, LITIS EA 4108, Rouen University Hospital, Rouen, France.
  • Badisse Dahamna
    Department of Biomedical Informatics, Cour Leschevin, CHU de Rouen, 1 Rue de Germont, 76031 Rouen, Normandie, France.
  • Clément Massonnaud
    Department of Biomedical Informatics, Cour Leschevin, CHU de Rouen, 1 Rue de Germont, 76031 Rouen, Normandie, France.
  • Gaëtan Kerdelhué
    Department of Biomedical Informatics, Cour Leschevin, CHU de Rouen, 1 Rue de Germont, 76031 Rouen, Normandie, France.
  • Julien Grosjean
    CISMeF & TIBS, LITIS EA 4108, Rouen University Hospital, Rouen, France.
  • Stéphane Canu
    LITIS, Université de Rouen Normandie, Avenue de l'Université, 76800, Saint-Étienne-du-Rouvray, Normandie, France.
  • Stefan Darmoni
    Department of Biomedical Informatics, Rouen University Hospital, TIBS, LITIS EA 4108 Rouen University, France.