Synthetic Generation of Patient Service Utilization Data: A Scalability Study.

Journal: Studies in health technology and informatics
Published Date:

Abstract

To address privacy and ethical issues in using health data for machine learning, we evaluate the scalability of advanced synthetic data generation methods like GANs, VAEs, copulaGAN, and transformer models specifically for patient service utilization data. Our study examines five models on data from a Canadian health authority, focusing on training and generation efficiency, data resemblance, and practical utility. Our findings indicate that statistical models excel in efficiency, while most models produce synthetic data that closely mirrors real data, and is also useful for real-world applications.

Authors

  • Joseph Howie
    University of Victoria, BC, Canada.
  • Sowmya Balasubramanian
    University of Victoria, BC, Canada.
  • Jonas Bambi
    University of Victoria, BC, Canada.
  • Kenneth Moselle
    University of Victoria, BC, Canada.
  • Venkatesh Srinivasan
    Santa Clara University, CA, USA.
  • Alex Thomo
    University of Victoria, BC, Canada.