Differential Cryptanalysis of Bloom Filters for Privacy-Preserving Record Linkage

Privacy-preserving record linkage (PPRL) aims to link records of the same real-world entity from different databases without exposing any private information about the entity. Bloom filters are widely used in PPRL due to their effectiveness in encoding records while enabling fast approximate linkage...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on information forensics and security Vol. 19; pp. 6665 - 6678
Main Authors Yin, Weifeng, Yuan, Lifeng, Ren, Yizhi, Meng, Weizhi, Wang, Dong, Wang, Qiuhua
Format Journal Article
LanguageEnglish
Published New York IEEE 2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Privacy-preserving record linkage (PPRL) aims to link records of the same real-world entity from different databases without exposing any private information about the entity. Bloom filters are widely used in PPRL due to their effectiveness in encoding records while enabling fast approximate linkage in the case of attribute value errors and changes. However, the basic Bloom filters used for PPRL can be subject to cryptanalysis attacks that expose the plain-text values encoded in them. Recent studies have successfully attacked some improved Bloom filter encodings in PPRL but require specific conditions or knowledge of various encoding parameters to obtain high accuracy. This paper presents a novel attack based on differential analysis against Bloom filters used for PPRL. The attack exploits graphs to model the relationship between attribute value variation and the difference between Bloom filters. Then, features are generated for the node in graphs according to a clustering algorithm that we propose. Thus, we can match nodes with similar features to re-identify encoded records. Experiments on two real-world databases show that even with improved Bloom filter encoding and some hardening techniques, our attack can re-identify private information from encoded records with high accuracy and require less priori knowledge.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1556-6013
1556-6021
DOI:10.1109/TIFS.2024.3421292