Developing and testing a framework for coding general practitioners' free-text diagnoses in electronic medical records - a reliability study for generating training data in natural language processing.

Journal: BMC primary care
Published Date:

Abstract

BACKGROUND: Diagnoses entered by general practitioners into electronic medical records have great potential for research and practice, but unfortunately, diagnoses are often in uncoded format, making them of little use. Natural language processing (NLP) could assist in coding free-text diagnoses, but NLP models require local training data to unlock their potential. The aim of this study was to develop a framework of research-relevant diagnostic codes, to test the framework using free-text diagnoses from a Swiss primary care database and to generate training data for NLP modelling.

Authors

  • Audrey Wallnöfer
    Institute of primary care, University and University Hospital Zurich, Pestalozzistr. 24, Zürich, 8091, Switzerland.
  • Jakob M Burgstaller
    Horten Centre for Patient Oriented Research And Knowledge Transfer, University of Zurich, Zurich, Switzerland.
  • Katja Weiss
    Institute of Primary Care, University of Zurich, Zurich, Switzerland. Electronic address: katja@weiss.co.com.
  • Thomas Rosemann
    Institute of primary care, University and University Hospital Zurich, Pestalozzistr. 24, Zürich, 8091, Switzerland.
  • Oliver Senn
    Institute of primary care, University and University Hospital Zurich, Pestalozzistr. 24, Zürich, 8091, Switzerland.
  • Stefan Markun
    Institute of primary care, University and University Hospital Zurich, Pestalozzistr. 24, Zürich, 8091, Switzerland. stefan.markun@usz.ch.