Investigating the Impact of Pre-trained Word Embeddings on Memorization in Neural Networks

The sensitive information present in the training data, poses a privacy concern for applications as their unintended memorization during training can make models susceptible to membership inference and attribute inference attacks. In this paper, we investigate this problem in various pre-trained wor...

Full description

Saved in:

Bibliographic Details
Published in	Text, Speech, and Dialogue Vol. 12284; pp. 273 - 281
Main Authors	Thomas, Aleena, Adelani, David Ifeoluwa, Davody, Ali, Mogadala, Aditya, Klakow, Dietrich
Format	Book Chapter
Language	English
Published	Switzerland Springer International Publishing AG 2020 Springer International Publishing
Series	Lecture Notes in Computer Science
Subjects	Differential privacy Unintended memorization Word representations
Online Access	Get full text
ISBN	9783030583224 3030583228
ISSN	0302-9743 1611-3349
DOI	10.1007/978-3-030-58323-1_30

Cover

Abstract	The sensitive information present in the training data, poses a privacy concern for applications as their unintended memorization during training can make models susceptible to membership inference and attribute inference attacks. In this paper, we investigate this problem in various pre-trained word embeddings (GloVe, ELMo and BERT) with the help of language models built on top of it. In particular, firstly sequences containing sensitive information like a single-word disease and 4-digit PIN are randomly inserted into the training data, then a language model is trained using word vectors as input features, and memorization is measured with a metric termed as exposure. The embedding dimension, the number of training epochs, and the length of the secret information were observed to affect memorization in pre-trained embeddings. Finally, to address the problem, differentially private language models were trained to reduce the exposure of sensitive information.
AbstractList	The sensitive information present in the training data, poses a privacy concern for applications as their unintended memorization during training can make models susceptible to membership inference and attribute inference attacks. In this paper, we investigate this problem in various pre-trained word embeddings (GloVe, ELMo and BERT) with the help of language models built on top of it. In particular, firstly sequences containing sensitive information like a single-word disease and 4-digit PIN are randomly inserted into the training data, then a language model is trained using word vectors as input features, and memorization is measured with a metric termed as exposure. The embedding dimension, the number of training epochs, and the length of the secret information were observed to affect memorization in pre-trained embeddings. Finally, to address the problem, differentially private language models were trained to reduce the exposure of sensitive information.
Author	Adelani, David Ifeoluwa Davody, Ali Mogadala, Aditya Klakow, Dietrich Thomas, Aleena
Author_xml	– sequence: 1 givenname: Aleena orcidid: 0000-0003-3606-8405 surname: Thomas fullname: Thomas, Aleena email: athomas@lsv.uni-saarland.dea – sequence: 2 givenname: David Ifeoluwa surname: Adelani fullname: Adelani, David Ifeoluwa – sequence: 3 givenname: Ali surname: Davody fullname: Davody, Ali – sequence: 4 givenname: Aditya surname: Mogadala fullname: Mogadala, Aditya – sequence: 5 givenname: Dietrich surname: Klakow fullname: Klakow, Dietrich
BookMark	eNo1kEtOwzAQhg0URFt6Axa-gMH2OE6yRFWBSuWxACGxsZzYKaFtHGwXJE6PW2A28_xHM98IDTrXWYTOGb1glOaXZV4QIBQoyQrgQJgCeoBGkCr7gjxEQyYZIwCiPEKTNP_f42KAhinmpMwFnKARYyUwEMlO0SSEd0opF1xmuRyi13n3aUNslzq23RLHN4vnm17XEbsGP3pLotdtZw1-cd7g2aayxqTBgF2H7-zG-fY7KVPSdvjebr1eJxe_nF-FM3Tc6HWwkz8_Rs_Xs6fpLVk83MynVwvScwGRmFoXGbM5pRmFohKGU1GLXJcaZF4La4qCMiFqaQrGZcOgMqUpqzLTwBlvGhgj_rs39D6dZr2qnFsFxajaoVQJjQKVgKg9ObVDmUTiV9R797FNBJTdqWrbpYfX9Zvuo_VBSeASABSkZbzI4AeJknOQ
ContentType	Book Chapter
Copyright	Springer Nature Switzerland AG 2020
Copyright_xml	– notice: Springer Nature Switzerland AG 2020
DBID	FFUUA
DEWEY	006.35
DOI	10.1007/978-3-030-58323-1_30
DatabaseName	ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISBN	3030583236 9783030583231
EISSN	1611-3349
Editor	Pala, Karel Horák, Ales Kopeček, Ivan Sojka, Petr
Editor_xml	– sequence: 1 fullname: Pala, Karel – sequence: 2 fullname: Sojka, Petr – sequence: 3 fullname: Horák, Ales – sequence: 4 fullname: Kopeček, Ivan
EndPage	281
ExternalDocumentID	EBC6326333_310_285
GroupedDBID	38. AABBV ACGCR AEDXK AEJLV AEJNW AEKFX AIYYB ALMA_UNASSIGNED_HOLDINGS APEJL AVCSZ AZTDL BBABE CYNQG CZZ DACMV ESBCR FFUUA I4C IEZ OAOFD OPOMJ SBO TPJZQ TSXQS Z5O Z7R Z7S Z7U Z7V Z7W Z7X Z7Y Z7Z Z81 Z82 Z83 Z84 Z85 Z87 Z88 -DT -GH -~X 1SB 29L 2HA 2HV 5QI 875 AASHB ABMNI ACGFS ADCXD AEFIE EJD F5P FEDTE HVGLF LAS LDH P2P RIG RNI RSU SVGTG VI1 ~02
ID	FETCH-LOGICAL-p243t-dca851e7005038b4d204c47a9a367c4ed880144c6d8126f13bd9d9b95a3212ff3
ISBN	9783030583224 3030583228
ISSN	0302-9743
IngestDate	Tue Jul 29 20:14:37 EDT 2025 Wed May 28 23:25:52 EDT 2025
IsPeerReviewed	true
IsScholarly	true
LCCallNum	Q334-342
Language	English
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-p243t-dca851e7005038b4d204c47a9a367c4ed880144c6d8126f13bd9d9b95a3212ff3
OCLC	1193134444
ORCID	0000-0003-3606-8405
PQID	EBC6326333_310_285
PageCount	9
ParticipantIDs	springer_books_10_1007_978_3_030_58323_1_30 proquest_ebookcentralchapters_6326333_310_285
PublicationCentury	2000
PublicationDate	2020
PublicationDateYYYYMMDD	2020-01-01
PublicationDate_xml	– year: 2020 text: 2020
PublicationDecade	2020
PublicationPlace	Switzerland
PublicationPlace_xml	– name: Switzerland – name: Cham
PublicationSeriesSubtitle	Lecture Notes in Artificial Intelligence
PublicationSeriesTitle	Lecture Notes in Computer Science
PublicationSeriesTitleAlternate	Lect.Notes Computer
PublicationSubtitle	23rd International Conference, TSD 2020, Brno, Czech Republic, September 8-11, 2020, Proceedings
PublicationTitle	Text, Speech, and Dialogue
PublicationYear	2020
Publisher	Springer International Publishing AG Springer International Publishing
Publisher_xml	– name: Springer International Publishing AG – name: Springer International Publishing
RelatedPersons	Hartmanis, Juris Gao, Wen Bertino, Elisa Woeginger, Gerhard Goos, Gerhard Steffen, Bernhard Yung, Moti
RelatedPersons_xml	– sequence: 1 givenname: Gerhard surname: Goos fullname: Goos, Gerhard – sequence: 2 givenname: Juris surname: Hartmanis fullname: Hartmanis, Juris – sequence: 3 givenname: Elisa surname: Bertino fullname: Bertino, Elisa – sequence: 4 givenname: Wen surname: Gao fullname: Gao, Wen – sequence: 5 givenname: Bernhard orcidid: 0000-0001-9619-1558 surname: Steffen fullname: Steffen, Bernhard – sequence: 6 givenname: Gerhard orcidid: 0000-0001-8816-2693 surname: Woeginger fullname: Woeginger, Gerhard – sequence: 7 givenname: Moti surname: Yung fullname: Yung, Moti
SSID	ssj0002426576 ssj0002792
Score	2.0923278
Snippet	The sensitive information present in the training data, poses a privacy concern for applications as their unintended memorization during training can make...
SourceID	springer proquest
SourceType	Publisher
StartPage	273
SubjectTerms	Differential privacy Unintended memorization Word representations
Title	Investigating the Impact of Pre-trained Word Embeddings on Memorization in Neural Networks
URI	http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=6326333&ppg=285 http://link.springer.com/10.1007/978-3-030-58323-1_30
Volume	12284
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LaxsxEBaOeyk99E1fKTr0ZlQcSfs69GCKQxISU2jShl7E6rHFkOyGeEOhv6k_sjOr1Vp2ckkuixFivdb3efab0cyIkE-O8yoX2rEssRUDfWtZXpWWSVsYk8I_jJcYhzxZpAdn8ug8OR-N_kVZSzet_mz-3llX8hBUYQxwxSrZeyA73BQG4DPgC1dAGK5b4nczzOoB7lM2vl85589z6rKKlz4aEyd8-JqtGSZNDEZ4ht0h6-WQ1j45rBw8659hAow23gDPLpYDLM3v0pYXPhZrQcCXMeWinh19DdbhUIP57dqx7jgKELg_sURxfqmd7Ta-cL_iBBN--5JQDMFgzxAgz8InqXvdj0vqVl-O-12PRdN2yWSTcDBFsFNxIINPtwIZIZC5FQpdR-M2PF-BhgqtkYwMpgDrDv6RN5jOG_QU2zQK3xY1GOlMRO977o-MufUqibNH4M5YnsYF21NiukN2slyOyaPZ_Oj4xxDRQ7GToLjrdQC2ZvR7WP6psLIoPHXuez-tf0VU1XnXV274P1tb9p0SOn1GnmB1DMWyFVi_52Tk6hfkaYCA9hC8JL82yECBDNSTgTYVjchAkQx0TQba1DQmA13W1JOBBjK8Imf789OvB6w_x4NdcSlaZk0Jut5lvveQlpZPpZFZWZQizYx0NscWRtKkFtRmWu0JbQtb6CIpBQirqhKvybhuaveG0NxUiQOnwBRC4yExhdSpcEmmM1k4niVvCQvrpLpsgz7F2fhVWakU3BUhhAKvRvEc5k_CYiqcvlKhjTegoIQCFFSHgkIU3t1r9nvyeE3zD2TcXt-4XVCwrf7YU-c_AtyTLw
linkProvider	Library Specific Holdings
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Text%2C+Speech%2C+and+Dialogue&rft.au=Thomas%2C+Aleena&rft.au=Adelani%2C+David+Ifeoluwa&rft.au=Davody%2C+Ali&rft.au=Mogadala%2C+Aditya&rft.atitle=Investigating+the+Impact+of+Pre-trained+Word+Embeddings+on+Memorization+in+Neural+Networks&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2020-01-01&rft.pub=Springer+International+Publishing&rft.isbn=9783030583224&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=273&rft.epage=281&rft_id=info:doi/10.1007%2F978-3-030-58323-1_30
thumbnail_s	http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F6326333-l.jpg