Using Deep Learning to Improve Phenotyping from Clinical Reports.

Journal: Studies in health technology and informatics
Published Date:

Abstract

With the development of clinical databases and the ubiquity of EHRs, physicians and researchers alike have access to an unprecedented amount of data. Complexity of the available data has also increased since clinical reports are also included and require frameworks with natural language processing capabilities in order to process them and extract information not found in other types of documents. In the following work we implement a data processing pipeline performing phenotyping, disambiguation, negation and subject prediction on such reports. We compare it to an existing solution routinely used in a children's hospital with special focus on genetic diseases. We show that by replacing components based on rules and pattern matching with components leveraging deep learning models and fine-tuned word embeddings we obtain performance improvements of 7%, 10% and 27% in terms of F1 measure for each task. The solution we devised will help build more reliable decision support systems.

Authors

  • Marc Vincent
    Université de Paris, Imagine Institute, Data Science Platform, INSERM UMR 1163, Paris, France.
  • Maxime Douillet
    Institut Imagine, Paris Descartes University-Sorbonne Paris Cité, Paris, France.
  • Ivan Lerner
    Paris University, Paris, France; AP-HP, DSI-WIND, Paris, France.
  • Antoine Neuraz
    Institut National de la Santé et de la Recherche Médicale (INSERM), Centre de Recherche des Cordeliers, UMR 1138 Equipe 22, Paris Descartes, Sorbonne Paris Cité University, Paris, France.
  • Anita Burgun
    Hôpital Necker-Enfants malades, AP-HP, Paris, France.
  • Nicolas Garcelon
    Plateforme data science - institut des maladies génétiques Imagine, Inserm, centre de recherche des Cordeliers, UMR 1138 équipe 22, institut Imagine, Paris-Descartes, université Sorbonne- Paris Cité, Paris, France.