A Hybrid AI-Based Method for ICD Classification of Medical Documents.

Journal: Studies in health technology and informatics
Published Date:

Abstract

Automatic document classification is a common problem that has successfully been addressed with machine learning methods. However, these methods require extensive training data, which is not always readily available. Additionally, in privacy-sensitive settings, transfer and reuse of trained machine learning models is not an option because sensitive information could potentially be reconstructed from the model. Therefore, we propose a transfer learning method that uses ontologies to normalize the feature space of text classifiers to create a controlled vocabulary. This ensures that the trained models do not contain personal data, and can be widely reused without violating the GDPR. Furthermore, the ontologies can be enriched so that the classifiers can be transferred to contexts with different terminology without additional training. Applying classifiers trained on medical documents to medical texts written in colloquial language shows promising results and highlights the potential of the approach. The compliance with GDPR by design opens many further application domains for transfer learning based solutions.

Authors

  • Daniel Bruness
    KITE, Technische Hochschule Mittelhessen, Friedberg, Germany.
  • Matthias Bay
    MINDS-Medical GmbH, Frankfurt, Germany.
  • Christian Schulze
    KITE, Technische Hochschule Mittelhessen, Friedberg, Germany.
  • Michael Guckert
    Cognitive Information Systems, KITE - Kompetenzzentrum für Informationstechnologie, Technische Hochschule Mittelhessen - University of Applied Sciences, Friedberg, Germany.
  • Mirjam Minor
    Department of Informatics, Goethe University Frankfurt, Frankfurt, Germany.