Investigating the Impact of Pre-trained Word Embeddings on Memorization in Neural Networks

The sensitive information present in the training data, poses a privacy concern for applications as their unintended memorization during training can make models susceptible to membership inference and attribute inference attacks. In this paper, we investigate this problem in various pre-trained wor...

Full description

Saved in:
Bibliographic Details
Published inText, Speech, and Dialogue Vol. 12284; pp. 273 - 281
Main Authors Thomas, Aleena, Adelani, David Ifeoluwa, Davody, Ali, Mogadala, Aditya, Klakow, Dietrich
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2020
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN9783030583224
3030583228
ISSN0302-9743
1611-3349
DOI10.1007/978-3-030-58323-1_30

Cover

Abstract The sensitive information present in the training data, poses a privacy concern for applications as their unintended memorization during training can make models susceptible to membership inference and attribute inference attacks. In this paper, we investigate this problem in various pre-trained word embeddings (GloVe, ELMo and BERT) with the help of language models built on top of it. In particular, firstly sequences containing sensitive information like a single-word disease and 4-digit PIN are randomly inserted into the training data, then a language model is trained using word vectors as input features, and memorization is measured with a metric termed as exposure. The embedding dimension, the number of training epochs, and the length of the secret information were observed to affect memorization in pre-trained embeddings. Finally, to address the problem, differentially private language models were trained to reduce the exposure of sensitive information.
AbstractList The sensitive information present in the training data, poses a privacy concern for applications as their unintended memorization during training can make models susceptible to membership inference and attribute inference attacks. In this paper, we investigate this problem in various pre-trained word embeddings (GloVe, ELMo and BERT) with the help of language models built on top of it. In particular, firstly sequences containing sensitive information like a single-word disease and 4-digit PIN are randomly inserted into the training data, then a language model is trained using word vectors as input features, and memorization is measured with a metric termed as exposure. The embedding dimension, the number of training epochs, and the length of the secret information were observed to affect memorization in pre-trained embeddings. Finally, to address the problem, differentially private language models were trained to reduce the exposure of sensitive information.
Author Adelani, David Ifeoluwa
Davody, Ali
Mogadala, Aditya
Klakow, Dietrich
Thomas, Aleena
Author_xml – sequence: 1
  givenname: Aleena
  orcidid: 0000-0003-3606-8405
  surname: Thomas
  fullname: Thomas, Aleena
  email: athomas@lsv.uni-saarland.dea
– sequence: 2
  givenname: David Ifeoluwa
  surname: Adelani
  fullname: Adelani, David Ifeoluwa
– sequence: 3
  givenname: Ali
  surname: Davody
  fullname: Davody, Ali
– sequence: 4
  givenname: Aditya
  surname: Mogadala
  fullname: Mogadala, Aditya
– sequence: 5
  givenname: Dietrich
  surname: Klakow
  fullname: Klakow, Dietrich
BookMark eNo1kEtOwzAQhg0URFt6Axa-gMH2OE6yRFWBSuWxACGxsZzYKaFtHGwXJE6PW2A28_xHM98IDTrXWYTOGb1glOaXZV4QIBQoyQrgQJgCeoBGkCr7gjxEQyYZIwCiPEKTNP_f42KAhinmpMwFnKARYyUwEMlO0SSEd0opF1xmuRyi13n3aUNslzq23RLHN4vnm17XEbsGP3pLotdtZw1-cd7g2aayxqTBgF2H7-zG-fY7KVPSdvjebr1eJxe_nF-FM3Tc6HWwkz8_Rs_Xs6fpLVk83MynVwvScwGRmFoXGbM5pRmFohKGU1GLXJcaZF4La4qCMiFqaQrGZcOgMqUpqzLTwBlvGhgj_rs39D6dZr2qnFsFxajaoVQJjQKVgKg9ObVDmUTiV9R797FNBJTdqWrbpYfX9Zvuo_VBSeASABSkZbzI4AeJknOQ
ContentType Book Chapter
Copyright Springer Nature Switzerland AG 2020
Copyright_xml – notice: Springer Nature Switzerland AG 2020
DBID FFUUA
DEWEY 006.35
DOI 10.1007/978-3-030-58323-1_30
DatabaseName ProQuest Ebook Central - Book Chapters - Demo use only
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISBN 3030583236
9783030583231
EISSN 1611-3349
Editor Pala, Karel
Horák, Ales
Kopeček, Ivan
Sojka, Petr
Editor_xml – sequence: 1
  fullname: Pala, Karel
– sequence: 2
  fullname: Sojka, Petr
– sequence: 3
  fullname: Horák, Ales
– sequence: 4
  fullname: Kopeček, Ivan
EndPage 281
ExternalDocumentID EBC6326333_310_285
GroupedDBID 38.
AABBV
ACGCR
AEDXK
AEJLV
AEJNW
AEKFX
AIYYB
ALMA_UNASSIGNED_HOLDINGS
APEJL
AVCSZ
AZTDL
BBABE
CYNQG
CZZ
DACMV
ESBCR
FFUUA
I4C
IEZ
OAOFD
OPOMJ
SBO
TPJZQ
TSXQS
Z5O
Z7R
Z7S
Z7U
Z7V
Z7W
Z7X
Z7Y
Z7Z
Z81
Z82
Z83
Z84
Z85
Z87
Z88
-DT
-GH
-~X
1SB
29L
2HA
2HV
5QI
875
AASHB
ABMNI
ACGFS
ADCXD
AEFIE
EJD
F5P
FEDTE
HVGLF
LAS
LDH
P2P
RIG
RNI
RSU
SVGTG
VI1
~02
ID FETCH-LOGICAL-p243t-dca851e7005038b4d204c47a9a367c4ed880144c6d8126f13bd9d9b95a3212ff3
ISBN 9783030583224
3030583228
ISSN 0302-9743
IngestDate Tue Jul 29 20:14:37 EDT 2025
Wed May 28 23:25:52 EDT 2025
IsPeerReviewed true
IsScholarly true
LCCallNum Q334-342
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-p243t-dca851e7005038b4d204c47a9a367c4ed880144c6d8126f13bd9d9b95a3212ff3
OCLC 1193134444
ORCID 0000-0003-3606-8405
PQID EBC6326333_310_285
PageCount 9
ParticipantIDs springer_books_10_1007_978_3_030_58323_1_30
proquest_ebookcentralchapters_6326333_310_285
PublicationCentury 2000
PublicationDate 2020
PublicationDateYYYYMMDD 2020-01-01
PublicationDate_xml – year: 2020
  text: 2020
PublicationDecade 2020
PublicationPlace Switzerland
PublicationPlace_xml – name: Switzerland
– name: Cham
PublicationSeriesSubtitle Lecture Notes in Artificial Intelligence
PublicationSeriesTitle Lecture Notes in Computer Science
PublicationSeriesTitleAlternate Lect.Notes Computer
PublicationSubtitle 23rd International Conference, TSD 2020, Brno, Czech Republic, September 8-11, 2020, Proceedings
PublicationTitle Text, Speech, and Dialogue
PublicationYear 2020
Publisher Springer International Publishing AG
Springer International Publishing
Publisher_xml – name: Springer International Publishing AG
– name: Springer International Publishing
RelatedPersons Hartmanis, Juris
Gao, Wen
Bertino, Elisa
Woeginger, Gerhard
Goos, Gerhard
Steffen, Bernhard
Yung, Moti
RelatedPersons_xml – sequence: 1
  givenname: Gerhard
  surname: Goos
  fullname: Goos, Gerhard
– sequence: 2
  givenname: Juris
  surname: Hartmanis
  fullname: Hartmanis, Juris
– sequence: 3
  givenname: Elisa
  surname: Bertino
  fullname: Bertino, Elisa
– sequence: 4
  givenname: Wen
  surname: Gao
  fullname: Gao, Wen
– sequence: 5
  givenname: Bernhard
  orcidid: 0000-0001-9619-1558
  surname: Steffen
  fullname: Steffen, Bernhard
– sequence: 6
  givenname: Gerhard
  orcidid: 0000-0001-8816-2693
  surname: Woeginger
  fullname: Woeginger, Gerhard
– sequence: 7
  givenname: Moti
  surname: Yung
  fullname: Yung, Moti
SSID ssj0002426576
ssj0002792
Score 2.0923278
Snippet The sensitive information present in the training data, poses a privacy concern for applications as their unintended memorization during training can make...
SourceID springer
proquest
SourceType Publisher
StartPage 273
SubjectTerms Differential privacy
Unintended memorization
Word representations
Title Investigating the Impact of Pre-trained Word Embeddings on Memorization in Neural Networks
URI http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=6326333&ppg=285
http://link.springer.com/10.1007/978-3-030-58323-1_30
Volume 12284
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LaxsxEBaOeyk99E1fKTr0ZlQcSfs69GCKQxISU2jShl7E6rHFkOyGeEOhv6k_sjOr1Vp2ckkuixFivdb3efab0cyIkE-O8yoX2rEssRUDfWtZXpWWSVsYk8I_jJcYhzxZpAdn8ug8OR-N_kVZSzet_mz-3llX8hBUYQxwxSrZeyA73BQG4DPgC1dAGK5b4nczzOoB7lM2vl85589z6rKKlz4aEyd8-JqtGSZNDEZ4ht0h6-WQ1j45rBw8659hAow23gDPLpYDLM3v0pYXPhZrQcCXMeWinh19DdbhUIP57dqx7jgKELg_sURxfqmd7Ta-cL_iBBN--5JQDMFgzxAgz8InqXvdj0vqVl-O-12PRdN2yWSTcDBFsFNxIINPtwIZIZC5FQpdR-M2PF-BhgqtkYwMpgDrDv6RN5jOG_QU2zQK3xY1GOlMRO977o-MufUqibNH4M5YnsYF21NiukN2slyOyaPZ_Oj4xxDRQ7GToLjrdQC2ZvR7WP6psLIoPHXuez-tf0VU1XnXV274P1tb9p0SOn1GnmB1DMWyFVi_52Tk6hfkaYCA9hC8JL82yECBDNSTgTYVjchAkQx0TQba1DQmA13W1JOBBjK8Imf789OvB6w_x4NdcSlaZk0Jut5lvveQlpZPpZFZWZQizYx0NscWRtKkFtRmWu0JbQtb6CIpBQirqhKvybhuaveG0NxUiQOnwBRC4yExhdSpcEmmM1k4niVvCQvrpLpsgz7F2fhVWakU3BUhhAKvRvEc5k_CYiqcvlKhjTegoIQCFFSHgkIU3t1r9nvyeE3zD2TcXt-4XVCwrf7YU-c_AtyTLw
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Text%2C+Speech%2C+and+Dialogue&rft.au=Thomas%2C+Aleena&rft.au=Adelani%2C+David+Ifeoluwa&rft.au=Davody%2C+Ali&rft.au=Mogadala%2C+Aditya&rft.atitle=Investigating+the+Impact+of+Pre-trained+Word+Embeddings+on+Memorization+in+Neural+Networks&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2020-01-01&rft.pub=Springer+International+Publishing&rft.isbn=9783030583224&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=273&rft.epage=281&rft_id=info:doi/10.1007%2F978-3-030-58323-1_30
thumbnail_s http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F6326333-l.jpg