Mitigating Unintended Memorization in Language Models Via Alternating Teaching

Recent research has shown that language models have a tendency to memorize rare or unique sequences in the training corpora which can thus leak sensitive attributes of user data. We employ a teacher-student framework and propose a novel approach called alternating teaching to mitigate unintended mem...

Full description

Saved in:
Bibliographic Details
Published inICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 1 - 5
Main Authors Liu, Zhe, Zhang, Xuedong, Peng, Fuchun
Format Conference Proceeding
LanguageEnglish
Published IEEE 04.06.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Recent research has shown that language models have a tendency to memorize rare or unique sequences in the training corpora which can thus leak sensitive attributes of user data. We employ a teacher-student framework and propose a novel approach called alternating teaching to mitigate unintended memorization in sequential modeling. In our method, multiple teachers are trained on disjoint training sets whose privacy one wishes to protect, and teachers' predictions supervise the training of a student model in an alternating manner at each time step. Experiments on LibriSpeech datasets show that the proposed method achieves superior privacy-preserving results than other counterparts. In comparison with no prevention for unintended memorization, the accuracy loss is small when training records are sufficient.
AbstractList Recent research has shown that language models have a tendency to memorize rare or unique sequences in the training corpora which can thus leak sensitive attributes of user data. We employ a teacher-student framework and propose a novel approach called alternating teaching to mitigate unintended memorization in sequential modeling. In our method, multiple teachers are trained on disjoint training sets whose privacy one wishes to protect, and teachers' predictions supervise the training of a student model in an alternating manner at each time step. Experiments on LibriSpeech datasets show that the proposed method achieves superior privacy-preserving results than other counterparts. In comparison with no prevention for unintended memorization, the accuracy loss is small when training records are sufficient.
Author Peng, Fuchun
Liu, Zhe
Zhang, Xuedong
Author_xml – sequence: 1
  givenname: Zhe
  surname: Liu
  fullname: Liu, Zhe
  organization: Meta AI,Menlo Park,CA,USA
– sequence: 2
  givenname: Xuedong
  surname: Zhang
  fullname: Zhang, Xuedong
  organization: Meta AI,Menlo Park,CA,USA
– sequence: 3
  givenname: Fuchun
  surname: Peng
  fullname: Peng, Fuchun
  organization: Meta AI,Menlo Park,CA,USA
BookMark eNo1T8tOwzAQNAgkmsIfcDAfkOK1nax9rCooSCkgtUXcKjfeBKPUQUk4wNcTqbCXGY00j03YWWwjMXYDYgYg7O3jYr5ev2irMpxJIdUMhLB5luEJSwClgVxJxFM2kQptCla8XbCk7z-EEAa1mbCnVRhC7YYQa76NIQ4UPXm-okPbhZ9RbyMPkRcu1l-uJr5qPTU9fw2Oz5uBuni0bsiV7yO5ZOeVa3q6-sMp297fbRYPafG8HLcWaQAUmGoEkqgrq0vYaxjPKQQPRupMkRFSG1VWOfp97sapXkJptVJmbAHjjVZTdn3MDUS0--zCwXXfu__n1S8281CU
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICASSP49357.2023.10096557
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE/IET Electronic Library
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 1728163277
9781728163277
EISSN 2379-190X
EndPage 5
ExternalDocumentID 10096557
Genre orig-research
GroupedDBID 23M
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
JC5
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i1707-471e274f94c1b41111a371d182453e802483cf67db6a000d21c94338eac18d843
IEDL.DBID RIE
IngestDate Wed Jun 26 19:24:40 EDT 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i1707-471e274f94c1b41111a371d182453e802483cf67db6a000d21c94338eac18d843
OpenAccessLink https://doi.org/10.1109/icassp49357.2023.10096557
PageCount 5
ParticipantIDs ieee_primary_10096557
PublicationCentury 2000
PublicationDate 2023-June-4
PublicationDateYYYYMMDD 2023-06-04
PublicationDate_xml – month: 06
  year: 2023
  text: 2023-June-4
  day: 04
PublicationDecade 2020
PublicationTitle ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublicationTitleAbbrev ICASSP
PublicationYear 2023
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0008748
Score 2.2827876
Snippet Recent research has shown that language models have a tendency to memorize rare or unique sequences in the training corpora which can thus leak sensitive...
SourceID ieee
SourceType Publisher
StartPage 1
SubjectTerms automatic speech recognition
Data models
Data privacy
knowledge distillation
Language modeling
Predictive models
Privacy
Signal processing
Speech recognition
Training
unintended memorization
Title Mitigating Unintended Memorization in Language Models Via Alternating Teaching
URI https://ieeexplore.ieee.org/document/10096557
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bS8MwFD64PYi-eJt4J4KvrdakTfI4hmOKG4NtsreRpIkURyfavfjrTdJsXkDwpZRC2nJCzpdz8p3vAFzpjEiRURkRiy32IkjEhDARVswYpzxpjDvR7Q-y3oQ8TNNpKFb3tTBaa08-07G79Wf5-UItXarMrnCnVZLSBjQo53Wx1trtMkrYJlwGEc3r-057NBoSjlMauxbh8WrwjzYqHkW6OzBYfb8mj7zEy0rG6uOXNOO_f3AXWl8Fe2i4hqI92NDlPmx_0xo8gEG_qOU0ymdkN5plnftGfce0DbWYqCjRY8hfItckbf6OngqB2vOQNbRDx4F92YJJ927c6UWhmUJUJPSGRhaEtI1ADScqkcQ5SoFpktvwgqRYMydthpXJaC4zYa2Z3yaKExu_2rcmLGcEH0KzXJT6CFBqnSQ32u1kMPES8CqnjCvJnTgdzY6h5Uwze631MmYrq5z88fwUttwMeQIWOYNm9bbU5xbqK3nhp_gT99emgA
link.rule.ids 310,311,783,787,792,793,799,23944,23945,25154,27939,55088
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bS8MwFD7oBC8v3ibejeBrq7Vpkz6O4dh0LYNtsrfRpskojk50e_HXm5Nm8wKCLyUUEsIJObd85zsANzKkWRqyzKHatuhPSh2epsrxBVcKmSeVwhfdOAnbQ_o4Cka2WN3UwkgpDfhMujg0b_n5TCwwVaZvOHKVBGwdNgJ0LKpyrZXi5YzyTbi2NJq3nWaj3-_RyA-Yi03C3eX0H41UjB1p7UKy3EEFH3lxF_PMFR-_yBn_vcU9qH-V7JHeyhjtw5osD2DnG9vgISRxURFqlBOiXc2yyn6TGLG2thqTFCXp2gwmwTZp03fyXKSkMbV5Qz11YPGXdRi2HgbNtmPbKTiFx-6Yo82Q1DGoiqjwMoqqMvWZl-sAgwa-5Ehu5gsVsjwLUy3N_N4TEdURrF7V4zmn_hHUylkpj4EEWk1GSqIv41NDAi9yxiORRUhPx8ITqKNoxq8VY8Z4KZXTP_5fwVZ7EHfH3U7ydAbbeFoGjkXPoTZ_W8gLbfjn2aU57k_jZKnN
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=ICASSP+2023+-+2023+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%28ICASSP%29&rft.atitle=Mitigating+Unintended+Memorization+in+Language+Models+Via+Alternating+Teaching&rft.au=Liu%2C+Zhe&rft.au=Zhang%2C+Xuedong&rft.au=Peng%2C+Fuchun&rft.date=2023-06-04&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FICASSP49357.2023.10096557&rft.externalDocID=10096557