PrivRS: Differentially private synthetic data generation via role similarity

Generating high-quality synthetic data subject to differential privacy (DP) requirements remains a fundamental challenge, particularly for high-dimensional datasets. Existing methods often struggle to strike a balance between privacy guarantees and data utility while maintaining computational effici...

Full description

Saved in:
Bibliographic Details
Published inComputers & security Vol. 157; p. 104616
Main Authors Ye, Xinxin, Zhu, Youwen, Pan, Jie, Zhang, Miao, Deng, Hai
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.10.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Generating high-quality synthetic data subject to differential privacy (DP) requirements remains a fundamental challenge, particularly for high-dimensional datasets. Existing methods often struggle to strike a balance between privacy guarantees and data utility while maintaining computational efficiency. To address their deficiencies, we propose PrivRS, a novel DP-compliant synthetic data generation framework that leverages role similarity (RS) to enhance data fidelity and utility. PrivRS introduces a structured approach to marginal selection by constructing a probabilistic complete graph G that encodes attribute correlations, effectively capturing complex dependencies. It then applies RS-based filtering to the data to identify informative marginals that satisfy DP constraints, reducing the level of noise injection while preserving key statistical properties. Finally, these post-processed noisy marginals are used to synthesize high-utility, privacy-preserving data. Extensive experiments on diverse datasets demonstrate that PrivRS achieves significant improvements over the state-of-the-art methods. Specifically, PrivRS enhances statistical query accuracy and clustering performance by at least 22% for high-cardinality domains while reducing computational overhead by an average of 28%. These results demonstrate PrivRS as a scalable and practical solution for differentially private data synthesis, capable to achieve a good trade-off among privacy, utility, and efficiency, while also performing well in terms of fairness, outlier behavior, and diversity.
ISSN:0167-4048
DOI:10.1016/j.cose.2025.104616