Generation and evaluation of synthetic patient data.

Journal: BMC medical research methodology
Published Date:

Abstract

BACKGROUND: Machine learning (ML) has made a significant impact in medicine and cancer research; however, its impact in these areas has been undeniably slower and more limited than in other application domains. A major reason for this has been the lack of availability of patient data to the broader ML research community, in large part due to patient privacy protection concerns. High-quality, realistic, synthetic datasets can be leveraged to accelerate methodological developments in medicine. By and large, medical data is high dimensional and often categorical. These characteristics pose multiple modeling challenges.

Authors

  • Andre Goncalves
    Lawrence Livermore National Laboratory, 7000 East Ave, Livermore, CA, USA. goncalves1@llnl.gov.
  • Priyadip Ray
    Lawrence Livermore National Laboratory, 7000 East Ave, Livermore, CA, USA.
  • Braden Soper
    Lawrence Livermore National Laboratory, 7000 East Ave, Livermore, CA, USA.
  • Jennifer Stevens
    Information Management Systems, 1455 Research Blvd, Suite 315, Rockville, MD, USA.
  • Linda Coyle
    Information Management Services Inc, Calverton, Maryland, USA.
  • Ana Paula Sales
    Lawrence Livermore National Laboratory, 7000 East Ave, Livermore, CA, USA.