Gradient-Leaks: Enabling Black-box Membership Inference Attacks against Machine Learning Models

Machine Learning (ML) techniques have been applied to many real-world applications to perform a wide range of tasks. In practice, ML models are typically deployed as the black-box APIs to protect the model owner's benefits and/or defend against various privacy attacks. In this paper, we present...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on information forensics and security Vol. 19; p. 1
Main Authors Liu, Gaoyang, Xu, Tianlong, Zhang, Rui, Wang, Zixiong, Wang, Chen, Liu, Ling
Format Journal Article
LanguageEnglish
Published New York IEEE 01.01.2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Machine Learning (ML) techniques have been applied to many real-world applications to perform a wide range of tasks. In practice, ML models are typically deployed as the black-box APIs to protect the model owner's benefits and/or defend against various privacy attacks. In this paper, we present Gradient-Leaks as the first evidence showcasing the possibility of performing membership inference attacks (MIAs), with mere black-box access , which aim to determine whether a data record was utilized to train a given target ML model or not. The key idea of Gradient-Leaks is to construct a local ML model around the given record which locally approximates the target model's prediction behavior. By extracting the membership information of the given record from the gradient of the substituted local model using an intentionally modified autoencoder, Gradient-Leaks can thus breach the membership privacy of the target model's training data in an unsupervised manner, without any priori knowledge about the target model's internals or its training data. Extensive experiments on different types of ML models with real-world datasets have shown that Gradient-Leaks can achieve a better performance compared with state-of-the-art attacks.
ISSN:1556-6013
1556-6021
DOI:10.1109/TIFS.2023.3324772