Watermark Smoothing Attacks against Language Models

Watermarking is a technique used to embed a hidden signal in the probability distribution of text generated by large language models (LLMs), enabling attribution of the text to the originating model. We introduce smoothing attacks and show that existing watermarking methods are not robust against mi...

Full description

Saved in:

Bibliographic Details
Main Authors	Chang, Hongyan, Hassani, Hamed, Shokri, Reza
Format	Journal Article
Language	English
Published	19.07.2024
Subjects	Computer Science - Learning
Online Access	Get full text

Cover

Loading…

Abstract	Watermarking is a technique used to embed a hidden signal in the probability distribution of text generated by large language models (LLMs), enabling attribution of the text to the originating model. We introduce smoothing attacks and show that existing watermarking methods are not robust against minor modifications of text. An adversary can use weaker language models to smooth out the distribution perturbations caused by watermarks without significantly compromising the quality of the generated text. The modified text resulting from the smoothing attack remains close to the distribution of text that the original model (without watermark) would have produced. Our attack reveals a fundamental limitation of a wide range of watermarking techniques.
AbstractList	Watermarking is a technique used to embed a hidden signal in the probability distribution of text generated by large language models (LLMs), enabling attribution of the text to the originating model. We introduce smoothing attacks and show that existing watermarking methods are not robust against minor modifications of text. An adversary can use weaker language models to smooth out the distribution perturbations caused by watermarks without significantly compromising the quality of the generated text. The modified text resulting from the smoothing attack remains close to the distribution of text that the original model (without watermark) would have produced. Our attack reveals a fundamental limitation of a wide range of watermarking techniques.
Author	Shokri, Reza Chang, Hongyan Hassani, Hamed
Author_xml	– sequence: 1 givenname: Hongyan surname: Chang fullname: Chang, Hongyan – sequence: 2 givenname: Hamed surname: Hassani fullname: Hassani, Hamed – sequence: 3 givenname: Reza surname: Shokri fullname: Shokri, Reza
BackLink	https://doi.org/10.48550/arXiv.2407.14206$$DView paper in arXiv
BookMark	eNrjYmDJy89LZWCQNDTQM7EwNTXQTyyqyCzTMzIxMNczNDEyMONkMA5PLEktyk0sylYIzs3PL8nIzEtXcCwpSUzOLlZITE_MzCsuUfBJzEsvTUxPVfDNT0nNKeZhYE1LzClO5YXS3Azybq4hzh66YOPjC4oygeZVxoOsiQdbY0xYBQAnqTOz
ContentType	Journal Article
Copyright	http://creativecommons.org/licenses/by/4.0
Copyright_xml	– notice: http://creativecommons.org/licenses/by/4.0
DBID	AKY GOX
DOI	10.48550/arxiv.2407.14206
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2407_14206
GroupedDBID	AKY GOX
ID	FETCH-arxiv_primary_2407_142063
IEDL.DBID	GOX
IngestDate	Tue Jul 23 12:10:31 EDT 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-arxiv_primary_2407_142063
OpenAccessLink	https://arxiv.org/abs/2407.14206
ParticipantIDs	arxiv_primary_2407_14206
PublicationCentury	2000
PublicationDate	2024-07-19
PublicationDateYYYYMMDD	2024-07-19
PublicationDate_xml	– month: 07 year: 2024 text: 2024-07-19 day: 19
PublicationDecade	2020
PublicationYear	2024
Score	3.8652048
SecondaryResourceType	preprint
Snippet	Watermarking is a technique used to embed a hidden signal in the probability distribution of text generated by large language models (LLMs), enabling...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Learning
Title	Watermark Smoothing Attacks against Language Models
URI	https://arxiv.org/abs/2407.14206
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQsTBNTgXWUha6qYbJBromRgZJuokWwIyXmJJqZpyclGRokQIaGvD1M_MINfGKMI1gYlCA7YVJLKrILIOcD5xUrA_qbgDzshHoTG1mIyPQki13_wjI5CT4KC6oeoQ6YBsTLIRUSbgJMvBDW3cKjpDoEGJgSs0TYTAOTwSXf0XZCsG5-cCQAdYWCo4lJaDd7QqJ6cCueXGJgg903FABdDlZTrEog7yba4izhy7YmvgCyJkQ8SAXxINdYCzGwALsuadKMCiYWqYaJqWaGieZpFmaJJknWiaaWKSBKmjT5JREQwsLSQYJXKZI4ZaSZuAyAtasoAFGQ0sZBpaSotJUWWDNWJIkBw4eAINPZ7M
link.rule.ids	228,230,786,891
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Watermark+Smoothing+Attacks+against+Language+Models&rft.au=Chang%2C+Hongyan&rft.au=Hassani%2C+Hamed&rft.au=Shokri%2C+Reza&rft.date=2024-07-19&rft_id=info:doi/10.48550%2Farxiv.2407.14206&rft.externalDocID=2407_14206