Natural language processing and machine learning to enable automatic extraction and classification of patients' smoking status from electronic medical records.

Journal: Upsala journal of medical sciences
PMID:

Abstract

BACKGROUND: The electronic medical record (EMR) offers unique possibilities for clinical research, but some important patient attributes are not readily available due to its unstructured properties. We applied text mining using machine learning to enable automatic classification of unstructured information on smoking status from Swedish EMR data.

Authors

  • Andrea Caccamisi
    Department of Learning, Informatics, Management and Ethics, Karolinska Institutet, Stockholm, Sweden.
  • Leif Jørgensen
    IQVIA Solutions Sweden AB, Solna, Sweden.
  • Hercules Dalianis
    Department of Computer and Systems Sciences, (DSV), Stockholm University, Sweden.
  • Mats Rosenlund
    Department of Learning, Informatics, Management and Ethics, Karolinska Institutet, Stockholm, Sweden.