LoRA Fine-tuning Efficiently Undoes Safety Training in Llama 2-Chat 70B
Get full text
Paper
Journal Article
BadLlama: cheaply removing safety fine-tuning from Llama 2-Chat 13B
Gade, Pranav, Lermen, Simon, Rogers-Smith, Charlie, Ladish, Jeffrey
Published in arXiv.org (21.03.2024)
Published in arXiv.org (21.03.2024)
Get full text
Paper
Journal Article
Evaluating Shutdown Avoidance of Language Models in Textual Scenarios
Get full text
Paper
Journal Article