Joint Distribution Analysis for Set-Valued Data With Local Differential Privacy

Set-valued data are commonly used to represent subsets of a universal set and are frequently utilized in online services, such as online shopping preferences, website browsing records, and recently visited places. By collecting set-valued data from users, service providers can perform statistical an...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on information forensics and security Vol. 19; pp. 7106 - 7117
Main Authors	Huang, Yaxuan, Xue, Kaiping, Zhu, Bin, Wei, David S. L., Sun, Qibin, Lu, Jun
Format	Journal Article
Language	English
Published	IEEE 2024
Subjects	Accuracy Differential privacy Estimation Frequency estimation joint distribution Local differential privacy Perturbation methods Privacy privacy preservation Security set-valued data
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Set-valued data are commonly used to represent subsets of a universal set and are frequently utilized in online services, such as online shopping preferences, website browsing records, and recently visited places. By collecting set-valued data from users, service providers can perform statistical analysis to obtain a joint distribution of service usage data and subsequently learn the association between different kinds of set-valued data to improve the quality of service. However, collecting set-valued data raises privacy concerns about the potential misuse of records to infer individuals' identities and preferences. Although some privacy-preserving aggregation mechanisms for set-valued data have been proposed, they have not yet achieved joint distribution analysis with high accuracy. In this paper, we propose a joint distribution analysis method for set-valued data with local differential privacy (LDP). We design a scalable perturbation mechanism under <inline-formula> <tex-math notation="LaTeX">\epsilon </tex-math></inline-formula>-LDP by limiting the range of users' responses in the collection process and cyclically shifting the set-valued data in an encoded uniform format, ensuring that the size of the universal set does not influence the accuracy of the results. Based on the perturbation method, we develop an analysis method to efficiently obtain association information between two sets. By performing specific bitwise operations on the perturbed data matrices, the computational overhead is linear with respect to the cardinality of the item set. In addition to theoretically analyzing the error bound and proving the security of our work, extensive experimental results on synthetic and real-world datasets demonstrate that our scheme achieves better utility than existing state-of-the-art approaches.
ISSN:	1556-6013 1556-6021
DOI:	10.1109/TIFS.2024.3423657