Waterfall: Framework for Robust and Scalable Text Watermarking
Protecting intellectual property (IP) of text such as articles and code is increasingly important, especially as sophisticated attacks become possible, such as paraphrasing by large language models (LLMs) or even unauthorized training of LLMs on copyrighted text to infringe such IP. However, existin...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
05.07.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Protecting intellectual property (IP) of text such as articles and code is
increasingly important, especially as sophisticated attacks become possible,
such as paraphrasing by large language models (LLMs) or even unauthorized
training of LLMs on copyrighted text to infringe such IP. However, existing
text watermarking methods are not robust enough against such attacks nor
scalable to millions of users for practical implementation. In this paper, we
propose Waterfall, the first training-free framework for robust and scalable
text watermarking applicable across multiple text types (e.g., articles, code)
and languages supportable by LLMs, for general text and LLM data provenance.
Waterfall comprises several key innovations, such as being the first to use LLM
as paraphrasers for watermarking along with a novel combination of techniques
that are surprisingly effective in achieving robust verifiability and
scalability. We empirically demonstrate that Waterfall achieves significantly
better scalability, robust verifiability, and computational efficiency compared
to SOTA article-text watermarking methods, and also showed how it could be
directly applied to the watermarking of code. |
---|---|
DOI: | 10.48550/arxiv.2407.04411 |