Watermark Smoothing Attacks against Language Models

Watermarking is a technique used to embed a hidden signal in the probability distribution of text generated by large language models (LLMs), enabling attribution of the text to the originating model. We introduce smoothing attacks and show that existing watermarking methods are not robust against mi...

Full description

Saved in:
Bibliographic Details
Main Authors Chang, Hongyan, Hassani, Hamed, Shokri, Reza
Format Journal Article
LanguageEnglish
Published 19.07.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Watermarking is a technique used to embed a hidden signal in the probability distribution of text generated by large language models (LLMs), enabling attribution of the text to the originating model. We introduce smoothing attacks and show that existing watermarking methods are not robust against minor modifications of text. An adversary can use weaker language models to smooth out the distribution perturbations caused by watermarks without significantly compromising the quality of the generated text. The modified text resulting from the smoothing attack remains close to the distribution of text that the original model (without watermark) would have produced. Our attack reveals a fundamental limitation of a wide range of watermarking techniques.
AbstractList Watermarking is a technique used to embed a hidden signal in the probability distribution of text generated by large language models (LLMs), enabling attribution of the text to the originating model. We introduce smoothing attacks and show that existing watermarking methods are not robust against minor modifications of text. An adversary can use weaker language models to smooth out the distribution perturbations caused by watermarks without significantly compromising the quality of the generated text. The modified text resulting from the smoothing attack remains close to the distribution of text that the original model (without watermark) would have produced. Our attack reveals a fundamental limitation of a wide range of watermarking techniques.
Author Shokri, Reza
Chang, Hongyan
Hassani, Hamed
Author_xml – sequence: 1
  givenname: Hongyan
  surname: Chang
  fullname: Chang, Hongyan
– sequence: 2
  givenname: Hamed
  surname: Hassani
  fullname: Hassani, Hamed
– sequence: 3
  givenname: Reza
  surname: Shokri
  fullname: Shokri, Reza
BackLink https://doi.org/10.48550/arXiv.2407.14206$$DView paper in arXiv
BookMark eNrjYmDJy89LZWCQNDTQM7EwNTXQTyyqyCzTMzIxMNczNDEyMONkMA5PLEktyk0sylYIzs3PL8nIzEtXcCwpSUzOLlZITE_MzCsuUfBJzEsvTUxPVfDNT0nNKeZhYE1LzClO5YXS3Azybq4hzh66YOPjC4oygeZVxoOsiQdbY0xYBQAnqTOz
ContentType Journal Article
Copyright http://creativecommons.org/licenses/by/4.0
Copyright_xml – notice: http://creativecommons.org/licenses/by/4.0
DBID AKY
GOX
DOI 10.48550/arxiv.2407.14206
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2407_14206
GroupedDBID AKY
GOX
ID FETCH-arxiv_primary_2407_142063
IEDL.DBID GOX
IngestDate Tue Jul 23 12:10:31 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-arxiv_primary_2407_142063
OpenAccessLink https://arxiv.org/abs/2407.14206
ParticipantIDs arxiv_primary_2407_14206
PublicationCentury 2000
PublicationDate 2024-07-19
PublicationDateYYYYMMDD 2024-07-19
PublicationDate_xml – month: 07
  year: 2024
  text: 2024-07-19
  day: 19
PublicationDecade 2020
PublicationYear 2024
Score 3.8652048
SecondaryResourceType preprint
Snippet Watermarking is a technique used to embed a hidden signal in the probability distribution of text generated by large language models (LLMs), enabling...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Learning
Title Watermark Smoothing Attacks against Language Models
URI https://arxiv.org/abs/2407.14206
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwY2BQsTBNTgXWUha6qYbJBromRgZJuokWwIyXmJJqZpyclGRokQIaGvD1M_MINfGKMI1gYlCA7YVJLKrILIOcD5xUrA_qbgDzshHoTG1mIyPQki13_wjI5CT4KC6oeoQ6YBsTLIRUSbgJMvBDW3cKjpDoEGJgSs0TYTAOTwSXf0XZCsG5-cCQAdYWCo4lJaDd7QqJ6cCueXGJgg903FABdDlZTrEog7yba4izhy7YmvgCyJkQ8SAXxINdYCzGwALsuadKMCiYWqYaJqWaGieZpFmaJJknWiaaWKSBKmjT5JREQwsLSQYJXKZI4ZaSZuAyAtasoAFGQ0sZBpaSotJUWWDNWJIkBw4eAINPZ7M
link.rule.ids 228,230,786,891
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Watermark+Smoothing+Attacks+against+Language+Models&rft.au=Chang%2C+Hongyan&rft.au=Hassani%2C+Hamed&rft.au=Shokri%2C+Reza&rft.date=2024-07-19&rft_id=info:doi/10.48550%2Farxiv.2407.14206&rft.externalDocID=2407_14206