Substituting clinical features using synthetic medical phrases: Medical text data augmentation techniques.

Journal: Artificial intelligence in medicine
Published Date:

Abstract

Biomedical natural language processing (NLP) has an important role in extracting consequential information in medical discharge notes. Detecting meaningful features from unstructured notes is a challenging task in medical document classification. The domain specific phrases and different synonyms within the medical documents make it hard to analyze them. Analyzing clinical notes becomes more challenging for short documents like abstract texts. All of these can result in poor classification performance, especially when there is a shortage of the clinical data in real life. Two new approaches (an ontology-guided approach and a combined ontology-based with dictionary-based approach) are suggested for augmenting medical data to enrich training data. Three different deep learning approaches are used to evaluate the classification performance of the proposed methods. The obtained results show that the proposed methods improved the classification accuracy in clinical notes classification.

Authors

  • Mahdi Abdollahi
    Victoria University of Wellington, Wellington, New Zealand. Electronic address: mahdi.abdollahi@ecs.vuw.ac.nz.
  • Xiaoying Gao
    School of Engineering and Computer Science, Victoria University of Wellington, Cotton Building, Kelburn Campus, Wellington, 6140, New Zealand.
  • Yi Mei
    Victoria University of Wellington, Wellington, New Zealand. Electronic address: yi.mei@ecs.vuw.ac.nz.
  • Shameek Ghosh
    Advanced Analytics Institute, Faculty of Engineering and IT, University of Technology Sydney (UTS), Australia. Electronic address: Shameek.Ghosh@student.uts.edu.au.
  • Jinyan Li
  • Michael Narag
    Medius Health, Sydney, Australia. Electronic address: michael.narag@mediushealth.org.