KIPPS: Knowledge infusion in Privacy Preserving Synthetic Data Generation
Journal:
arXiv
Published Date:
Sep 25, 2024
Abstract
The integration of privacy measures, including differential privacy
techniques, ensures a provable privacy guarantee for the synthetic data.
However, challenges arise for Generative Deep Learning models when tasked with
generating realistic data, especially in critical domains such as Cybersecurity
and Healthcare. Generative Models optimized for continuous data struggle to
model discrete and non-Gaussian features that have domain constraints.
Challenges increase when the training datasets are limited and not diverse. In
such cases, generative models create synthetic data that repeats sensitive
features, which is a privacy risk. Moreover, generative models face
difficulties comprehending attribute constraints in specialized domains. This
leads to the generation of unrealistic data that impacts downstream accuracy.
To address these issues, this paper proposes a novel model, KIPPS, that infuses
Domain and Regulatory Knowledge from Knowledge Graphs into Generative Deep
Learning models for enhanced Privacy Preserving Synthetic data generation. The
novel framework augments the training of generative models with supplementary
context about attribute values and enforces domain constraints during training.
This added guidance enhances the model's capacity to generate realistic and
domain-compliant synthetic data. The proposed model is evaluated on real-world
datasets, specifically in the domains of Cybersecurity and Healthcare, where
domain constraints and rules add to the complexity of the data. Our experiments
evaluate the privacy resilience and downstream accuracy of the model against
benchmark methods, demonstrating its effectiveness in addressing the balance
between privacy preservation and data accuracy in complex domains.