An Adversorial Approach to Enable Re-Use of Machine Learning Models and Collaborative Research Efforts Using Synthetic Unstructured Free-Text Medical Data.

Journal: Studies in health technology and informatics
Published Date:

Abstract

We leverage Generative Adversarial Networks (GAN) to produce synthetic free-text medical data with low re-identification risk, and apply these to replicate machine learning solutions. We trained GAN models to generate free-text cancer pathology reports. Decision models were trained using synthetic datasets reported performance metrics that were statistically similar to models trained using original test data. Our results further the use of GANs to generate synthetic data for collaborative research and re-use of machine learning models.

Authors

  • Suranga N Kasthurirathne
    Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, IN, United States.
  • Gregory Dexter
    Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, Indiana, USA.
  • Shaun J Grannis
    Regenstrief Institute, Indianapolis, IN, USA.