Mitigating Unintended Memorization in Language Models Via Alternating Teaching
Recent research has shown that language models have a tendency to memorize rare or unique sequences in the training corpora which can thus leak sensitive attributes of user data. We employ a teacher-student framework and propose a novel approach called alternating teaching to mitigate unintended mem...
Saved in:
Published in | ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 1 - 5 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
04.06.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Recent research has shown that language models have a tendency to memorize rare or unique sequences in the training corpora which can thus leak sensitive attributes of user data. We employ a teacher-student framework and propose a novel approach called alternating teaching to mitigate unintended memorization in sequential modeling. In our method, multiple teachers are trained on disjoint training sets whose privacy one wishes to protect, and teachers' predictions supervise the training of a student model in an alternating manner at each time step. Experiments on LibriSpeech datasets show that the proposed method achieves superior privacy-preserving results than other counterparts. In comparison with no prevention for unintended memorization, the accuracy loss is small when training records are sufficient. |
---|---|
AbstractList | Recent research has shown that language models have a tendency to memorize rare or unique sequences in the training corpora which can thus leak sensitive attributes of user data. We employ a teacher-student framework and propose a novel approach called alternating teaching to mitigate unintended memorization in sequential modeling. In our method, multiple teachers are trained on disjoint training sets whose privacy one wishes to protect, and teachers' predictions supervise the training of a student model in an alternating manner at each time step. Experiments on LibriSpeech datasets show that the proposed method achieves superior privacy-preserving results than other counterparts. In comparison with no prevention for unintended memorization, the accuracy loss is small when training records are sufficient. |
Author | Peng, Fuchun Liu, Zhe Zhang, Xuedong |
Author_xml | – sequence: 1 givenname: Zhe surname: Liu fullname: Liu, Zhe organization: Meta AI,Menlo Park,CA,USA – sequence: 2 givenname: Xuedong surname: Zhang fullname: Zhang, Xuedong organization: Meta AI,Menlo Park,CA,USA – sequence: 3 givenname: Fuchun surname: Peng fullname: Peng, Fuchun organization: Meta AI,Menlo Park,CA,USA |
BookMark | eNo1T8tOwzAQNAgkmsIfcDAfkOK1nax9rCooSCkgtUXcKjfeBKPUQUk4wNcTqbCXGY00j03YWWwjMXYDYgYg7O3jYr5ev2irMpxJIdUMhLB5luEJSwClgVxJxFM2kQptCla8XbCk7z-EEAa1mbCnVRhC7YYQa76NIQ4UPXm-okPbhZ9RbyMPkRcu1l-uJr5qPTU9fw2Oz5uBuni0bsiV7yO5ZOeVa3q6-sMp297fbRYPafG8HLcWaQAUmGoEkqgrq0vYaxjPKQQPRupMkRFSG1VWOfp97sapXkJptVJmbAHjjVZTdn3MDUS0--zCwXXfu__n1S8281CU |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/ICASSP49357.2023.10096557 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE/IET Electronic Library IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISBN | 1728163277 9781728163277 |
EISSN | 2379-190X |
EndPage | 5 |
ExternalDocumentID | 10096557 |
Genre | orig-research |
GroupedDBID | 23M 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI JC5 M43 OCL RIE RIL RIO RNS |
ID | FETCH-LOGICAL-i1707-471e274f94c1b41111a371d182453e802483cf67db6a000d21c94338eac18d843 |
IEDL.DBID | RIE |
IngestDate | Wed Jun 26 19:24:40 EDT 2024 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i1707-471e274f94c1b41111a371d182453e802483cf67db6a000d21c94338eac18d843 |
OpenAccessLink | https://doi.org/10.1109/icassp49357.2023.10096557 |
PageCount | 5 |
ParticipantIDs | ieee_primary_10096557 |
PublicationCentury | 2000 |
PublicationDate | 2023-June-4 |
PublicationDateYYYYMMDD | 2023-06-04 |
PublicationDate_xml | – month: 06 year: 2023 text: 2023-June-4 day: 04 |
PublicationDecade | 2020 |
PublicationTitle | ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
PublicationTitleAbbrev | ICASSP |
PublicationYear | 2023 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0008748 |
Score | 2.2827876 |
Snippet | Recent research has shown that language models have a tendency to memorize rare or unique sequences in the training corpora which can thus leak sensitive... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 1 |
SubjectTerms | automatic speech recognition Data models Data privacy knowledge distillation Language modeling Predictive models Privacy Signal processing Speech recognition Training unintended memorization |
Title | Mitigating Unintended Memorization in Language Models Via Alternating Teaching |
URI | https://ieeexplore.ieee.org/document/10096557 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bS8MwFD64PYi-eJt4J4KvrdakTfI4hmOKG4NtsreRpIkURyfavfjrTdJsXkDwpZRC2nJCzpdz8p3vAFzpjEiRURkRiy32IkjEhDARVswYpzxpjDvR7Q-y3oQ8TNNpKFb3tTBaa08-07G79Wf5-UItXarMrnCnVZLSBjQo53Wx1trtMkrYJlwGEc3r-057NBoSjlMauxbh8WrwjzYqHkW6OzBYfb8mj7zEy0rG6uOXNOO_f3AXWl8Fe2i4hqI92NDlPmx_0xo8gEG_qOU0ymdkN5plnftGfce0DbWYqCjRY8hfItckbf6OngqB2vOQNbRDx4F92YJJ927c6UWhmUJUJPSGRhaEtI1ADScqkcQ5SoFpktvwgqRYMydthpXJaC4zYa2Z3yaKExu_2rcmLGcEH0KzXJT6CFBqnSQ32u1kMPES8CqnjCvJnTgdzY6h5Uwze631MmYrq5z88fwUttwMeQIWOYNm9bbU5xbqK3nhp_gT99emgA |
link.rule.ids | 310,311,783,787,792,793,799,23944,23945,25154,27939,55088 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1bS8MwFD7oBC8v3ibejeBrq7Vpkz6O4dh0LYNtsrfRpskojk50e_HXm5Nm8wKCLyUUEsIJObd85zsANzKkWRqyzKHatuhPSh2epsrxBVcKmSeVwhfdOAnbQ_o4Cka2WN3UwkgpDfhMujg0b_n5TCwwVaZvOHKVBGwdNgJ0LKpyrZXi5YzyTbi2NJq3nWaj3-_RyA-Yi03C3eX0H41UjB1p7UKy3EEFH3lxF_PMFR-_yBn_vcU9qH-V7JHeyhjtw5osD2DnG9vgISRxURFqlBOiXc2yyn6TGLG2thqTFCXp2gwmwTZp03fyXKSkMbV5Qz11YPGXdRi2HgbNtmPbKTiFx-6Yo82Q1DGoiqjwMoqqMvWZl-sAgwa-5Ehu5gsVsjwLUy3N_N4TEdURrF7V4zmn_hHUylkpj4EEWk1GSqIv41NDAi9yxiORRUhPx8ITqKNoxq8VY8Z4KZXTP_5fwVZ7EHfH3U7ydAbbeFoGjkXPoTZ_W8gLbfjn2aU57k_jZKnN |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=ICASSP+2023+-+2023+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%28ICASSP%29&rft.atitle=Mitigating+Unintended+Memorization+in+Language+Models+Via+Alternating+Teaching&rft.au=Liu%2C+Zhe&rft.au=Zhang%2C+Xuedong&rft.au=Peng%2C+Fuchun&rft.date=2023-06-04&rft.pub=IEEE&rft.eissn=2379-190X&rft.spage=1&rft.epage=5&rft_id=info:doi/10.1109%2FICASSP49357.2023.10096557&rft.externalDocID=10096557 |