Differential Cryptanalysis of Bloom Filters for Privacy-Preserving Record Linkage

Privacy-preserving record linkage (PPRL) aims to link records of the same real-world entity from different databases without exposing any private information about the entity. Bloom filters are widely used in PPRL due to their effectiveness in encoding records while enabling fast approximate linkage...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on information forensics and security Vol. 19; pp. 6665 - 6678
Main Authors	Yin, Weifeng, Yuan, Lifeng, Ren, Yizhi, Meng, Weizhi, Wang, Dong, Wang, Qiuhua
Format	Journal Article
Language	English
Published	New York IEEE 2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Accuracy Algorithms bloom filter Clustering Coding Couplings differential cryptanalysis Encoding entity resolution Filters Graphs Hash functions Parameter identification Privacy Privacy attack re-identification record linkage
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Privacy-preserving record linkage (PPRL) aims to link records of the same real-world entity from different databases without exposing any private information about the entity. Bloom filters are widely used in PPRL due to their effectiveness in encoding records while enabling fast approximate linkage in the case of attribute value errors and changes. However, the basic Bloom filters used for PPRL can be subject to cryptanalysis attacks that expose the plain-text values encoded in them. Recent studies have successfully attacked some improved Bloom filter encodings in PPRL but require specific conditions or knowledge of various encoding parameters to obtain high accuracy. This paper presents a novel attack based on differential analysis against Bloom filters used for PPRL. The attack exploits graphs to model the relationship between attribute value variation and the difference between Bloom filters. Then, features are generated for the node in graphs according to a clustering algorithm that we propose. Thus, we can match nodes with similar features to re-identify encoded records. Experiments on two real-world databases show that even with improved Bloom filter encoding and some hardening techniques, our attack can re-identify private information from encoded records with high accuracy and require less priori knowledge.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1556-6013 1556-6021
DOI:	10.1109/TIFS.2024.3421292