Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits
The rapid proliferation of open-source language models significantly increases the risks of downstream backdoor attacks. These backdoors can introduce dangerous behaviours during model deployment and can evade detection by conventional cybersecurity monitoring systems. In this paper, we introduce a...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
03.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Be the first to leave a comment!