Investigating the Impact of Pre-trained Word Embeddings on Memorization in Neural Networks
The sensitive information present in the training data, poses a privacy concern for applications as their unintended memorization during training can make models susceptible to membership inference and attribute inference attacks. In this paper, we investigate this problem in various pre-trained wor...
Saved in:
Published in | Text, Speech, and Dialogue Vol. 12284; pp. 273 - 281 |
---|---|
Main Authors | , , , , |
Format | Book Chapter |
Language | English |
Published |
Switzerland
Springer International Publishing AG
2020
Springer International Publishing |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
ISBN | 9783030583224 3030583228 |
ISSN | 0302-9743 1611-3349 |
DOI | 10.1007/978-3-030-58323-1_30 |
Cover
Abstract | The sensitive information present in the training data, poses a privacy concern for applications as their unintended memorization during training can make models susceptible to membership inference and attribute inference attacks. In this paper, we investigate this problem in various pre-trained word embeddings (GloVe, ELMo and BERT) with the help of language models built on top of it. In particular, firstly sequences containing sensitive information like a single-word disease and 4-digit PIN are randomly inserted into the training data, then a language model is trained using word vectors as input features, and memorization is measured with a metric termed as exposure. The embedding dimension, the number of training epochs, and the length of the secret information were observed to affect memorization in pre-trained embeddings. Finally, to address the problem, differentially private language models were trained to reduce the exposure of sensitive information. |
---|---|
AbstractList | The sensitive information present in the training data, poses a privacy concern for applications as their unintended memorization during training can make models susceptible to membership inference and attribute inference attacks. In this paper, we investigate this problem in various pre-trained word embeddings (GloVe, ELMo and BERT) with the help of language models built on top of it. In particular, firstly sequences containing sensitive information like a single-word disease and 4-digit PIN are randomly inserted into the training data, then a language model is trained using word vectors as input features, and memorization is measured with a metric termed as exposure. The embedding dimension, the number of training epochs, and the length of the secret information were observed to affect memorization in pre-trained embeddings. Finally, to address the problem, differentially private language models were trained to reduce the exposure of sensitive information. |
Author | Adelani, David Ifeoluwa Davody, Ali Mogadala, Aditya Klakow, Dietrich Thomas, Aleena |
Author_xml | – sequence: 1 givenname: Aleena orcidid: 0000-0003-3606-8405 surname: Thomas fullname: Thomas, Aleena email: athomas@lsv.uni-saarland.dea – sequence: 2 givenname: David Ifeoluwa surname: Adelani fullname: Adelani, David Ifeoluwa – sequence: 3 givenname: Ali surname: Davody fullname: Davody, Ali – sequence: 4 givenname: Aditya surname: Mogadala fullname: Mogadala, Aditya – sequence: 5 givenname: Dietrich surname: Klakow fullname: Klakow, Dietrich |
BookMark | eNo1kEtOwzAQhg0URFt6Axa-gMH2OE6yRFWBSuWxACGxsZzYKaFtHGwXJE6PW2A28_xHM98IDTrXWYTOGb1glOaXZV4QIBQoyQrgQJgCeoBGkCr7gjxEQyYZIwCiPEKTNP_f42KAhinmpMwFnKARYyUwEMlO0SSEd0opF1xmuRyi13n3aUNslzq23RLHN4vnm17XEbsGP3pLotdtZw1-cd7g2aayxqTBgF2H7-zG-fY7KVPSdvjebr1eJxe_nF-FM3Tc6HWwkz8_Rs_Xs6fpLVk83MynVwvScwGRmFoXGbM5pRmFohKGU1GLXJcaZF4La4qCMiFqaQrGZcOgMqUpqzLTwBlvGhgj_rs39D6dZr2qnFsFxajaoVQJjQKVgKg9ObVDmUTiV9R797FNBJTdqWrbpYfX9Zvuo_VBSeASABSkZbzI4AeJknOQ |
ContentType | Book Chapter |
Copyright | Springer Nature Switzerland AG 2020 |
Copyright_xml | – notice: Springer Nature Switzerland AG 2020 |
DBID | FFUUA |
DEWEY | 006.35 |
DOI | 10.1007/978-3-030-58323-1_30 |
DatabaseName | ProQuest Ebook Central - Book Chapters - Demo use only |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISBN | 3030583236 9783030583231 |
EISSN | 1611-3349 |
Editor | Pala, Karel Horák, Ales Kopeček, Ivan Sojka, Petr |
Editor_xml | – sequence: 1 fullname: Pala, Karel – sequence: 2 fullname: Sojka, Petr – sequence: 3 fullname: Horák, Ales – sequence: 4 fullname: Kopeček, Ivan |
EndPage | 281 |
ExternalDocumentID | EBC6326333_310_285 |
GroupedDBID | 38. AABBV ACGCR AEDXK AEJLV AEJNW AEKFX AIYYB ALMA_UNASSIGNED_HOLDINGS APEJL AVCSZ AZTDL BBABE CYNQG CZZ DACMV ESBCR FFUUA I4C IEZ OAOFD OPOMJ SBO TPJZQ TSXQS Z5O Z7R Z7S Z7U Z7V Z7W Z7X Z7Y Z7Z Z81 Z82 Z83 Z84 Z85 Z87 Z88 -DT -GH -~X 1SB 29L 2HA 2HV 5QI 875 AASHB ABMNI ACGFS ADCXD AEFIE EJD F5P FEDTE HVGLF LAS LDH P2P RIG RNI RSU SVGTG VI1 ~02 |
ID | FETCH-LOGICAL-p243t-dca851e7005038b4d204c47a9a367c4ed880144c6d8126f13bd9d9b95a3212ff3 |
ISBN | 9783030583224 3030583228 |
ISSN | 0302-9743 |
IngestDate | Tue Jul 29 20:14:37 EDT 2025 Wed May 28 23:25:52 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
LCCallNum | Q334-342 |
Language | English |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-p243t-dca851e7005038b4d204c47a9a367c4ed880144c6d8126f13bd9d9b95a3212ff3 |
OCLC | 1193134444 |
ORCID | 0000-0003-3606-8405 |
PQID | EBC6326333_310_285 |
PageCount | 9 |
ParticipantIDs | springer_books_10_1007_978_3_030_58323_1_30 proquest_ebookcentralchapters_6326333_310_285 |
PublicationCentury | 2000 |
PublicationDate | 2020 |
PublicationDateYYYYMMDD | 2020-01-01 |
PublicationDate_xml | – year: 2020 text: 2020 |
PublicationDecade | 2020 |
PublicationPlace | Switzerland |
PublicationPlace_xml | – name: Switzerland – name: Cham |
PublicationSeriesSubtitle | Lecture Notes in Artificial Intelligence |
PublicationSeriesTitle | Lecture Notes in Computer Science |
PublicationSeriesTitleAlternate | Lect.Notes Computer |
PublicationSubtitle | 23rd International Conference, TSD 2020, Brno, Czech Republic, September 8-11, 2020, Proceedings |
PublicationTitle | Text, Speech, and Dialogue |
PublicationYear | 2020 |
Publisher | Springer International Publishing AG Springer International Publishing |
Publisher_xml | – name: Springer International Publishing AG – name: Springer International Publishing |
RelatedPersons | Hartmanis, Juris Gao, Wen Bertino, Elisa Woeginger, Gerhard Goos, Gerhard Steffen, Bernhard Yung, Moti |
RelatedPersons_xml | – sequence: 1 givenname: Gerhard surname: Goos fullname: Goos, Gerhard – sequence: 2 givenname: Juris surname: Hartmanis fullname: Hartmanis, Juris – sequence: 3 givenname: Elisa surname: Bertino fullname: Bertino, Elisa – sequence: 4 givenname: Wen surname: Gao fullname: Gao, Wen – sequence: 5 givenname: Bernhard orcidid: 0000-0001-9619-1558 surname: Steffen fullname: Steffen, Bernhard – sequence: 6 givenname: Gerhard orcidid: 0000-0001-8816-2693 surname: Woeginger fullname: Woeginger, Gerhard – sequence: 7 givenname: Moti surname: Yung fullname: Yung, Moti |
SSID | ssj0002426576 ssj0002792 |
Score | 2.0923278 |
Snippet | The sensitive information present in the training data, poses a privacy concern for applications as their unintended memorization during training can make... |
SourceID | springer proquest |
SourceType | Publisher |
StartPage | 273 |
SubjectTerms | Differential privacy Unintended memorization Word representations |
Title | Investigating the Impact of Pre-trained Word Embeddings on Memorization in Neural Networks |
URI | http://ebookcentral.proquest.com/lib/SITE_ID/reader.action?docID=6326333&ppg=285 http://link.springer.com/10.1007/978-3-030-58323-1_30 |
Volume | 12284 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LaxsxEBaOeyk99E1fKTr0ZlQcSfs69GCKQxISU2jShl7E6rHFkOyGeEOhv6k_sjOr1Vp2ckkuixFivdb3efab0cyIkE-O8yoX2rEssRUDfWtZXpWWSVsYk8I_jJcYhzxZpAdn8ug8OR-N_kVZSzet_mz-3llX8hBUYQxwxSrZeyA73BQG4DPgC1dAGK5b4nczzOoB7lM2vl85589z6rKKlz4aEyd8-JqtGSZNDEZ4ht0h6-WQ1j45rBw8659hAow23gDPLpYDLM3v0pYXPhZrQcCXMeWinh19DdbhUIP57dqx7jgKELg_sURxfqmd7Ta-cL_iBBN--5JQDMFgzxAgz8InqXvdj0vqVl-O-12PRdN2yWSTcDBFsFNxIINPtwIZIZC5FQpdR-M2PF-BhgqtkYwMpgDrDv6RN5jOG_QU2zQK3xY1GOlMRO977o-MufUqibNH4M5YnsYF21NiukN2slyOyaPZ_Oj4xxDRQ7GToLjrdQC2ZvR7WP6psLIoPHXuez-tf0VU1XnXV274P1tb9p0SOn1GnmB1DMWyFVi_52Tk6hfkaYCA9hC8JL82yECBDNSTgTYVjchAkQx0TQba1DQmA13W1JOBBjK8Imf789OvB6w_x4NdcSlaZk0Jut5lvveQlpZPpZFZWZQizYx0NscWRtKkFtRmWu0JbQtb6CIpBQirqhKvybhuaveG0NxUiQOnwBRC4yExhdSpcEmmM1k4niVvCQvrpLpsgz7F2fhVWakU3BUhhAKvRvEc5k_CYiqcvlKhjTegoIQCFFSHgkIU3t1r9nvyeE3zD2TcXt-4XVCwrf7YU-c_AtyTLw |
linkProvider | Library Specific Holdings |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Text%2C+Speech%2C+and+Dialogue&rft.au=Thomas%2C+Aleena&rft.au=Adelani%2C+David+Ifeoluwa&rft.au=Davody%2C+Ali&rft.au=Mogadala%2C+Aditya&rft.atitle=Investigating+the+Impact+of+Pre-trained+Word+Embeddings+on+Memorization+in+Neural+Networks&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2020-01-01&rft.pub=Springer+International+Publishing&rft.isbn=9783030583224&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=273&rft.epage=281&rft_id=info:doi/10.1007%2F978-3-030-58323-1_30 |
thumbnail_s | http://utb.summon.serialssolutions.com/2.0.0/image/custom?url=https%3A%2F%2Febookcentral.proquest.com%2Fcovers%2F6326333-l.jpg |