PrivRS: Differentially private synthetic data generation via role similarity
Generating high-quality synthetic data subject to differential privacy (DP) requirements remains a fundamental challenge, particularly for high-dimensional datasets. Existing methods often struggle to strike a balance between privacy guarantees and data utility while maintaining computational effici...
Saved in:
Published in | Computers & security Vol. 157; p. 104616 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.10.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Generating high-quality synthetic data subject to differential privacy (DP) requirements remains a fundamental challenge, particularly for high-dimensional datasets. Existing methods often struggle to strike a balance between privacy guarantees and data utility while maintaining computational efficiency. To address their deficiencies, we propose PrivRS, a novel DP-compliant synthetic data generation framework that leverages role similarity (RS) to enhance data fidelity and utility. PrivRS introduces a structured approach to marginal selection by constructing a probabilistic complete graph G that encodes attribute correlations, effectively capturing complex dependencies. It then applies RS-based filtering to the data to identify informative marginals that satisfy DP constraints, reducing the level of noise injection while preserving key statistical properties. Finally, these post-processed noisy marginals are used to synthesize high-utility, privacy-preserving data. Extensive experiments on diverse datasets demonstrate that PrivRS achieves significant improvements over the state-of-the-art methods. Specifically, PrivRS enhances statistical query accuracy and clustering performance by at least 22% for high-cardinality domains while reducing computational overhead by an average of 28%. These results demonstrate PrivRS as a scalable and practical solution for differentially private data synthesis, capable to achieve a good trade-off among privacy, utility, and efficiency, while also performing well in terms of fairness, outlier behavior, and diversity. |
---|---|
ISSN: | 0167-4048 |
DOI: | 10.1016/j.cose.2025.104616 |