Investigating the Impact of Pre-trained Word Embeddings on Memorization in Neural Networks

The sensitive information present in the training data, poses a privacy concern for applications as their unintended memorization during training can make models susceptible to membership inference and attribute inference attacks. In this paper, we investigate this problem in various pre-trained wor...

Full description

Saved in:

Bibliographic Details
Published in	Text, Speech, and Dialogue Vol. 12284; pp. 273 - 281
Main Authors	Thomas, Aleena, Adelani, David Ifeoluwa, Davody, Ali, Mogadala, Aditya, Klakow, Dietrich
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2020 Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Differential privacy Unintended memorization Word representations
Online Access	Get full text
ISBN	9783030583224 3030583228
ISSN	0302-9743 1611-3349
DOI	10.1007/978-3-030-58323-1_30

Cover

Loading…

More Information
Summary:	The sensitive information present in the training data, poses a privacy concern for applications as their unintended memorization during training can make models susceptible to membership inference and attribute inference attacks. In this paper, we investigate this problem in various pre-trained word embeddings (GloVe, ELMo and BERT) with the help of language models built on top of it. In particular, firstly sequences containing sensitive information like a single-word disease and 4-digit PIN are randomly inserted into the training data, then a language model is trained using word vectors as input features, and memorization is measured with a metric termed as exposure. The embedding dimension, the number of training epochs, and the length of the secret information were observed to affect memorization in pre-trained embeddings. Finally, to address the problem, differentially private language models were trained to reduce the exposure of sensitive information.
ISBN:	9783030583224 3030583228
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-030-58323-1_30