Mitigating Unintended Memorization in Language Models Via Alternating Teaching

Recent research has shown that language models have a tendency to memorize rare or unique sequences in the training corpora which can thus leak sensitive attributes of user data. We employ a teacher-student framework and propose a novel approach called alternating teaching to mitigate unintended mem...

Full description

Saved in:

Bibliographic Details
Published in	ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 1 - 5
Main Authors	Liu, Zhe, Zhang, Xuedong, Peng, Fuchun
Format	Conference Proceeding
Language	English
Published	IEEE 04.06.2023
Subjects	automatic speech recognition Data models Data privacy knowledge distillation Language modeling Predictive models Privacy Signal processing Speech recognition Training unintended memorization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Recent research has shown that language models have a tendency to memorize rare or unique sequences in the training corpora which can thus leak sensitive attributes of user data. We employ a teacher-student framework and propose a novel approach called alternating teaching to mitigate unintended memorization in sequential modeling. In our method, multiple teachers are trained on disjoint training sets whose privacy one wishes to protect, and teachers' predictions supervise the training of a student model in an alternating manner at each time step. Experiments on LibriSpeech datasets show that the proposed method achieves superior privacy-preserving results than other counterparts. In comparison with no prevention for unintended memorization, the accuracy loss is small when training records are sufficient.
ISSN:	2379-190X
DOI:	10.1109/ICASSP49357.2023.10096557