Watermark Smoothing Attacks against Language Models
Watermarking is a technique used to embed a hidden signal in the probability distribution of text generated by large language models (LLMs), enabling attribution of the text to the originating model. We introduce smoothing attacks and show that existing watermarking methods are not robust against mi...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
19.07.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Watermarking is a technique used to embed a hidden signal in the probability
distribution of text generated by large language models (LLMs), enabling
attribution of the text to the originating model. We introduce smoothing
attacks and show that existing watermarking methods are not robust against
minor modifications of text. An adversary can use weaker language models to
smooth out the distribution perturbations caused by watermarks without
significantly compromising the quality of the generated text. The modified text
resulting from the smoothing attack remains close to the distribution of text
that the original model (without watermark) would have produced. Our attack
reveals a fundamental limitation of a wide range of watermarking techniques. |
---|---|
AbstractList | Watermarking is a technique used to embed a hidden signal in the probability
distribution of text generated by large language models (LLMs), enabling
attribution of the text to the originating model. We introduce smoothing
attacks and show that existing watermarking methods are not robust against
minor modifications of text. An adversary can use weaker language models to
smooth out the distribution perturbations caused by watermarks without
significantly compromising the quality of the generated text. The modified text
resulting from the smoothing attack remains close to the distribution of text
that the original model (without watermark) would have produced. Our attack
reveals a fundamental limitation of a wide range of watermarking techniques. |
Author | Shokri, Reza Chang, Hongyan Hassani, Hamed |
Author_xml | – sequence: 1 givenname: Hongyan surname: Chang fullname: Chang, Hongyan – sequence: 2 givenname: Hamed surname: Hassani fullname: Hassani, Hamed – sequence: 3 givenname: Reza surname: Shokri fullname: Shokri, Reza |
BackLink | https://doi.org/10.48550/arXiv.2407.14206$$DView paper in arXiv |
BookMark | eNrjYmDJy89LZWCQNDTQM7EwNTXQTyyqyCzTMzIxMNczNDEyMONkMA5PLEktyk0sylYIzs3PL8nIzEtXcCwpSUzOLlZITE_MzCsuUfBJzEsvTUxPVfDNT0nNKeZhYE1LzClO5YXS3Azybq4hzh66YOPjC4oygeZVxoOsiQdbY0xYBQAnqTOz |
ContentType | Journal Article |
Copyright | http://creativecommons.org/licenses/by/4.0 |
Copyright_xml | – notice: http://creativecommons.org/licenses/by/4.0 |
DBID | AKY GOX |
DOI | 10.48550/arxiv.2407.14206 |
DatabaseName | arXiv Computer Science arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 2407_14206 |
GroupedDBID | AKY GOX |
ID | FETCH-arxiv_primary_2407_142063 |
IEDL.DBID | GOX |
IngestDate | Tue Jul 23 12:10:31 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-arxiv_primary_2407_142063 |
OpenAccessLink | https://arxiv.org/abs/2407.14206 |
ParticipantIDs | arxiv_primary_2407_14206 |
PublicationCentury | 2000 |
PublicationDate | 2024-07-19 |
PublicationDateYYYYMMDD | 2024-07-19 |
PublicationDate_xml | – month: 07 year: 2024 text: 2024-07-19 day: 19 |
PublicationDecade | 2020 |
PublicationYear | 2024 |
Score | 3.8652048 |
SecondaryResourceType | preprint |
Snippet | Watermarking is a technique used to embed a hidden signal in the probability
distribution of text generated by large language models (LLMs), enabling... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Learning |
Title | Watermark Smoothing Attacks against Language Models |
URI | https://arxiv.org/abs/2407.14206 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQsTBNTgXWUha6qYbJBromRgZJuokWwIyXmJJqZpyclGRokQIaGvD1M_MINfGKMI1gYlCA7YVJLKrILIOcD5xUrA_qbgDzshHoTG1mIyPQki13_wjI5CT4KC6oeoQ6YBsTLIRUSbgJMvBDW3cKjpDoEGJgSs0TYTAOTwSXf0XZCsG5-cCQAdYWCo4lJaDd7QqJ6cCueXGJgg903FABdDlZTrEog7yba4izhy7YmvgCyJkQ8SAXxINdYCzGwALsuadKMCiYWqYaJqWaGieZpFmaJJknWiaaWKSBKmjT5JREQwsLSQYJXKZI4ZaSZuAyAtasoAFGQ0sZBpaSotJUWWDNWJIkBw4eAINPZ7M |
link.rule.ids | 228,230,786,891 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Watermark+Smoothing+Attacks+against+Language+Models&rft.au=Chang%2C+Hongyan&rft.au=Hassani%2C+Hamed&rft.au=Shokri%2C+Reza&rft.date=2024-07-19&rft_id=info:doi/10.48550%2Farxiv.2407.14206&rft.externalDocID=2407_14206 |