Joint Distribution Analysis for Set-Valued Data With Local Differential Privacy
Set-valued data are commonly used to represent subsets of a universal set and are frequently utilized in online services, such as online shopping preferences, website browsing records, and recently visited places. By collecting set-valued data from users, service providers can perform statistical an...
Saved in:
Published in | IEEE transactions on information forensics and security Vol. 19; pp. 7106 - 7117 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
IEEE
2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Set-valued data are commonly used to represent subsets of a universal set and are frequently utilized in online services, such as online shopping preferences, website browsing records, and recently visited places. By collecting set-valued data from users, service providers can perform statistical analysis to obtain a joint distribution of service usage data and subsequently learn the association between different kinds of set-valued data to improve the quality of service. However, collecting set-valued data raises privacy concerns about the potential misuse of records to infer individuals' identities and preferences. Although some privacy-preserving aggregation mechanisms for set-valued data have been proposed, they have not yet achieved joint distribution analysis with high accuracy. In this paper, we propose a joint distribution analysis method for set-valued data with local differential privacy (LDP). We design a scalable perturbation mechanism under <inline-formula> <tex-math notation="LaTeX">\epsilon </tex-math></inline-formula>-LDP by limiting the range of users' responses in the collection process and cyclically shifting the set-valued data in an encoded uniform format, ensuring that the size of the universal set does not influence the accuracy of the results. Based on the perturbation method, we develop an analysis method to efficiently obtain association information between two sets. By performing specific bitwise operations on the perturbed data matrices, the computational overhead is linear with respect to the cardinality of the item set. In addition to theoretically analyzing the error bound and proving the security of our work, extensive experimental results on synthetic and real-world datasets demonstrate that our scheme achieves better utility than existing state-of-the-art approaches. |
---|---|
ISSN: | 1556-6013 1556-6021 |
DOI: | 10.1109/TIFS.2024.3423657 |