Evaluating shallow and deep learning strategies for the 2018 n2c2 shared task on clinical text classification.

Journal: Journal of the American Medical Informatics Association : JAMIA
Published Date:

Abstract

OBJECTIVE: Automated clinical phenotyping is challenging because word-based features quickly turn it into a high-dimensional problem, in which the small, privacy-restricted, training datasets might lead to overfitting. Pretrained embeddings might solve this issue by reusing input representation schemes trained on a larger dataset. We sought to evaluate shallow and deep learning text classifiers and the impact of pretrained embeddings in a small clinical dataset.

Authors

  • Michel Oleynik
    Institute of Mathematics and Statistics, University of São Paulo, São Paulo, Brazil.
  • Amila Kugic
    Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Graz, Austria.
  • Zdenko Kasáč
    Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Graz, Austria.
  • Markus Kreuzthaler
    Institute of Medical Informatics, Statistics, and Documentation, Medical University of Graz, Austria.