Black Box Adversarial Prompting for Foundation Models
Prompting interfaces allow users to quickly adjust the output of generative models in both vision and language. However, small changes and design choices in the prompt can lead to significant differences in the output. In this work, we develop a black-box framework for generating adversarial prompts...
Saved in:
Main Authors | , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
08.02.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Prompting interfaces allow users to quickly adjust the output of generative
models in both vision and language. However, small changes and design choices
in the prompt can lead to significant differences in the output. In this work,
we develop a black-box framework for generating adversarial prompts for
unstructured image and text generation. These prompts, which can be standalone
or prepended to benign prompts, induce specific behaviors into the generative
process, such as generating images of a particular object or generating high
perplexity text. |
---|---|
DOI: | 10.48550/arxiv.2302.04237 |