Subject-Level Membership Inference Attack via Data Augmentation and Model Discrepancy

Federated learning (FL) models are vulnerable to membership inference attacks (MIAs), and the requirement of individual privacy motivates the protection of subjects where the individual data is distributed across multiple users in the cross-silo FL setting. In this paper, we propose a subject-level...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on information forensics and security Vol. 18; pp. 5848 - 5859
Main Authors	Liu, Yimin, Jiang, Peng, Zhu, Liehuang
Format	Journal Article
Language	English
Published	New York IEEE 2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Data augmentation Data models Data privacy Degradation Distributed databases Federated learning Generative adversarial networks Inference Machine learning Privacy privacy degradation subject-level membership inference attacks Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Federated learning (FL) models are vulnerable to membership inference attacks (MIAs), and the requirement of individual privacy motivates the protection of subjects where the individual data is distributed across multiple users in the cross-silo FL setting. In this paper, we propose a subject-level membership inference attack based on data augmentation and model discrepancy. It can effectively infer whether the data distribution of the target subject has been sampled and used for training by specific federated users, even if other users (also) may sample from the same subject and use it as part of their training set. Specifically, the adversary uses a generative adversarial network (GAN) to perform data augmentation on a small amount of priori federation-associated information known in advance. Subsequently, the adversary merges two different outputs from the global and tested user models using an optimal feature construction method. We simulate a controlled federation configuration and conduct extensive experiments on real datasets that include both image and categorical data. Results show that the area under the curve (AUC) is improved by 12.6% to 16.8% compared to the classical membership inference attack. This is at the expense of the test accuracy of the data augmented with GAN, which is at most 3.5% lower than the real test data. We also explore the degree of privacy leakage between overfitted models and well-generalized models in the cross-silo FL setting and conclude experimentally that the former is more likely to leak individual privacy with a subject-level degradation rate of up to 0.43. Finally, we present two possible defense mechanisms to attenuate this newly discovered privacy risk.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1556-6013 1556-6021
DOI:	10.1109/TIFS.2023.3318950