Publicly-Detectable Watermarking for Language Models
We present a highly detectable, trustless watermarking scheme for LLMs: the detection algorithm contains no secret information, and it is executable by anyone. We embed a publicly-verifiable cryptographic signature into LLM output using rejection sampling. We prove that our scheme is cryptographical...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
27.10.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We present a highly detectable, trustless watermarking scheme for LLMs: the
detection algorithm contains no secret information, and it is executable by
anyone. We embed a publicly-verifiable cryptographic signature into LLM output
using rejection sampling. We prove that our scheme is cryptographically
correct, sound, and distortion-free. We make novel uses of error-correction
techniques to overcome periods of low entropy, a barrier for all prior
watermarking schemes. We implement our scheme and make empirical measurements
over open models in the 2.7B to 70B parameter range. Our experiments suggest
that our formal claims are met in practice. |
---|---|
DOI: | 10.48550/arxiv.2310.18491 |