Mitigating Unintended Memorization in Language Models Via Alternating Teaching

Recent research has shown that language models have a tendency to memorize rare or unique sequences in the training corpora which can thus leak sensitive attributes of user data. We employ a teacher-student framework and propose a novel approach called alternating teaching to mitigate unintended mem...

Full description

Saved in:

Bibliographic Details
Published in	ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 1 - 5
Main Authors	Liu, Zhe, Zhang, Xuedong, Peng, Fuchun
Format	Conference Proceeding
Language	English
Published	IEEE 04.06.2023
Subjects	automatic speech recognition Data models Data privacy knowledge distillation Language modeling Predictive models Privacy Signal processing Speech recognition Training unintended memorization
Online Access	Get full text

Cover

Loading…

Abstract	Recent research has shown that language models have a tendency to memorize rare or unique sequences in the training corpora which can thus leak sensitive attributes of user data. We employ a teacher-student framework and propose a novel approach called alternating teaching to mitigate unintended memorization in sequential modeling. In our method, multiple teachers are trained on disjoint training sets whose privacy one wishes to protect, and teachers' predictions supervise the training of a student model in an alternating manner at each time step. Experiments on LibriSpeech datasets show that the proposed method achieves superior privacy-preserving results than other counterparts. In comparison with no prevention for unintended memorization, the accuracy loss is small when training records are sufficient.
AbstractList	Recent research has shown that language models have a tendency to memorize rare or unique sequences in the training corpora which can thus leak sensitive attributes of user data. We employ a teacher-student framework and propose a novel approach called alternating teaching to mitigate unintended memorization in sequential modeling. In our method, multiple teachers are trained on disjoint training sets whose privacy one wishes to protect, and teachers' predictions supervise the training of a student model in an alternating manner at each time step. Experiments on LibriSpeech datasets show that the proposed method achieves superior privacy-preserving results than other counterparts. In comparison with no prevention for unintended memorization, the accuracy loss is small when training records are sufficient.
Author	Peng, Fuchun Liu, Zhe Zhang, Xuedong
Author_xml	– sequence: 1 givenname: Zhe surname: Liu fullname: Liu, Zhe organization: Meta AI,Menlo Park,CA,USA – sequence: 2 givenname: Xuedong surname: Zhang fullname: Zhang, Xuedong organization: Meta AI,Menlo Park,CA,USA – sequence: 3 givenname: Fuchun surname: Peng fullname: Peng, Fuchun organization: Meta AI,Menlo Park,CA,USA
BookMark	eNo1T8tOwzAQNAgkmsIfcDAfkOK1nax9rCooSCkgtUXcKjfeBKPUQUk4wNcTqbCXGY00j03YWWwjMXYDYgYg7O3jYr5ev2irMpxJIdUMhLB5luEJSwClgVxJxFM2kQptCla8XbCk7z-EEAa1mbCnVRhC7YYQa76NIQ4UPXm-okPbhZ9RbyMPkRcu1l-uJr5qPTU9fw2Oz5uBuni0bsiV7yO5ZOeVa3q6-sMp297fbRYPafG8HLcWaQAUmGoEkqgrq0vYaxjPKQQPRupMkRFSG1VWOfp97sapXkJptVJmbAHjjVZTdn3MDUS0--zCwXXfu__n1S8281CU
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/ICASSP49357.2023.10096557
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISBN	1728163277 9781728163277
EISSN	2379-190X
EndPage	5
ExternalDocumentID	10096557
Genre	orig-research
GroupedDBID	23M 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI JC5 M43 OCL RIE RIL RIO RNS
ID	FETCH-LOGICAL-i1707-471e274f94c1b41111a371d182453e802483cf67db6a000d21c94338eac18d843
IEDL.DBID	RIE
IngestDate	Wed Jun 26 19:24:40 EDT 2024
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i1707-471e274f94c1b41111a371d182453e802483cf67db6a000d21c94338eac18d843
OpenAccessLink	https://doi.org/10.1109/icassp49357.2023.10096557
PageCount	5
ParticipantIDs	ieee_primary_10096557
PublicationCentury	2000
PublicationDate	2023-June-4
PublicationDateYYYYMMDD	2023-06-04
PublicationDate_xml	– month: 06 year: 2023 text: 2023-June-4 day: 04
PublicationDecade	2020
PublicationTitle	ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublicationTitleAbbrev	ICASSP
PublicationYear	2023
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0008748
Score	2.2827876
Snippet	Recent research has shown that language models have a tendency to memorize rare or unique sequences in the training corpora which can thus leak sensitive...
SourceID	ieee
SourceType	Publisher
StartPage	1
SubjectTerms	automatic speech recognition Data models Data privacy knowledge distillation Language modeling Predictive models Privacy Signal processing Speech recognition Training unintended memorization
Title	Mitigating Unintended Memorization in Language Models Via Alternating Teaching
URI	https://ieeexplore.ieee.org/document/10096557
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bS8MwFD64PYi-eJt4J4KvrdakTfI4hmOKG4NtsreRpIkURyfavfjrTdJsXkDwpZRC2nJCzpdz8p3vAFzpjEiRURkRiy32IkjEhDARVswYpzxpjDvR7Q-y3oQ8TNNpKFb3tTBaa08-07G79Wf5-UItXarMrnCnVZLSBjQo53Wx1trtMkrYJlwGEc3r-057NBoSjlMauxbh8WrwjzYqHkW6OzBYfb8mj7zEy0rG6uOXNOO_f3AXWl8Fe2i4hqI92NDlPmx_0xo8gEG_qOU0ymdkN5plnftGfce0DbWYqCjRY8hfItckbf6OngqB2vOQNbRDx4F92YJJ927c6UWhmUJUJPSGRhaEtI1ADScqkcQ5SoFpktvwgqRYMydthpXJaC4zYa2Z3yaKExu_2rcmLGcEH0KzXJT6CFBqnSQ32u1kMPES8CqnjCvJnTgdzY6h5Uwze631MmYrq5z88fwUttwMeQIWOYNm9bbU5xbqK3nhp_gT99emgA
link.rule.ids	310,311,783,787,792,793,799,23944,23945,25154,27939,55088
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bS8MwFD7oBC8v3ibejeBrq7Vpkz6O4dh0LYNtsrfRpskojk50e_HXm5Nm8wKCLyUUEsIJObd85zsANzKkWRqyzKHatuhPSh2epsrxBVcKmSeVwhfdOAnbQ_o4Cka2WN3UwkgpDfhMujg0b_n5TCwwVaZvOHKVBGwdNgJ0LKpyrZXi5YzyTbi2NJq3nWaj3-_RyA-Yi03C3eX0H41UjB1p7UKy3EEFH3lxF_PMFR-_yBn_vcU9qH-V7JHeyhjtw5osD2DnG9vgISRxURFqlBOiXc2yyn6TGLG2thqTFCXp2gwmwTZp03fyXKSkMbV5Qz11YPGXdRi2HgbNtmPbKTiFx-6Yo82Q1DGoiqjwMoqqMvWZl-sAgwa-5Ehu5gsVsjwLUy3N_N4TEdURrF7V4zmn_hHUylkpj4EEWk1GSqIv41NDAi9yxiORRUhPx8ITqKNoxq8VY8Z4KZXTP_5fwVZ7EHfH3U7ydAbbeFoGjkXPoTZ_W8gLbfjn2aU57k_jZKnN
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=ICASSP+2023+-+2023+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%28ICASSP%29&rft.atitle=Mitigating+Unintended+Memorization+in+Language+Models+Via+Alternating+Teaching&rft.au=Liu%2C+Zhe&rft.au=Zhang%2C+Xuedong&rft.au=Peng%2C+Fuchun&rft.date=2023-06-04&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FICASSP49357.2023.10096557&rft.externalDocID=10096557