Boosting ICD multi-label classification of health records with contextual embeddings and label-granularity.

Journal: Computer methods and programs in biomedicine
Published Date:

Abstract

BACKGROUND AND OBJECTIVE: This work deals with clinical text mining, a field of Natural Language Processing applied to biomedical informatics. The aim is to classify Electronic Health Records with respect to the International Classification of Diseases, which is the foundation for the identification of international health statistics, and the standard for reporting diseases and health conditions. Within the framework of data mining, the goal is the multi-label classification, as each health record has assigned multiple International Classification of Diseases codes. We investigate five Deep Learning architectures with a dataset obtained from the Basque Country Health System, and six different perspectives derived from shifts in the input and the output.

Authors

  • Alberto Blanco
    IXA Taldea. UPV-EHU, Manuel Lardizabal Ibilbidea, 1, Donostia 20018, Spain. Electronic address: ablanco061@ikasle.ehu.eus.
  • Olatz Perez-de-Viñaspre
  • Alicia Pérez
    IXA Group, University of the Basque Country (UPV-EHU), Computer Engineering Faculty, P. Manuel Lardizabal, 1, 20018 Donostia-San Sebastián, Spain(1).
  • Arantza Casillas
    IXA Group, University of the Basque Country (UPV-EHU), Computer Engineering Faculty, P. Manuel Lardizabal, 1, 20018 Donostia-San Sebastián, Spain(1). Electronic address: arantza.casillas@ehu.eus.