On the fidelity versus privacy and utility trade-off of synthetic patient data.

Journal: iScience
Published Date:

Abstract

The use of synthetic data is a widely discussed and promising solution for privacy-preserving medical research. Synthetic data may, however, not always rule out the risk of re-identifying characteristics of real patients and can vary greatly in terms of data fidelity and utility. We systematically evaluate the trade-offs between privacy, fidelity, and utility across five synthetic data models and three patient-level datasets. We evaluate fidelity based on statistical similarity to the real data, utility on three machine learning use cases, and privacy via membership inference, singling out, and attribute inference risks. Synthetic data without differential privacy (DP) maintained fidelity and utility without evident privacy breaches, whereas DP-enforced models significantly disrupted correlation structures. K-anonymity-based data sanitization of demographic features, while preserving fidelity, introduced notable privacy risks. Our findings emphasize the need to advance methods that effectively balance privacy, fidelity, and utility in synthetic patient data generation.

Authors

  • Tim Adams
    Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, Sankt Augustin, Germany.
  • Colin Birkenbihl
    Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin 53757, Germany.
  • Karen Otte
    Berlin Institute of Health (BIH), Berlin, Germany.
  • Hwei Geok Ng
    Fraunhofer Institute for Algorithms and Scientific Computing, Schloss Birlinghoven, Sankt Augustin, Germany.
  • Jonas Adrian Rieling
    Department of Medical Informatics, University Medical Center Goettingen, Goettingen, Germany.
  • Anatol-Fiete Näher
    Digital Global Public Health, Hasso Plattner Institute for Digital Engineering, University of Potsdam, Potsdam, Germany.
  • Ulrich Sax
    Department of Medical Informatics, University Medical Center Goettingen, Goettingen, Germany.
  • Fabian Prasser
    Berlin Institute of Health (BIH), Berlin, Germany.
  • Holger Fröhlich
    Department of Bioinformatics, Fraunhofer Institute for Algorithms and Scientific Computing (SCAI), Sankt Augustin 53757, Germany.

Keywords

No keywords available for this article.