Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
Putta, Pranav, Mills, Edmund, Garg, Naman, Motwani, Sumeet, Finn, Chelsea, Garg, Divyansh, Rafailov, Rafael
Year of Publication 13.08.2024
Year of Publication 13.08.2024
Get full text
Journal Article
Unelicitable Backdoors in Language Models via Cryptographic Transformer Circuits
Draguns, Andis, Gritsevskiy, Andrew, Motwani, Sumeet Ramesh, Rogers-Smith, Charlie, Ladish, Jeffrey, de Witt, Christian Schroeder
Year of Publication 03.06.2024
Year of Publication 03.06.2024
Get full text
Journal Article
STARC: A General Framework For Quantifying Differences Between Reward Functions
Skalse, Joar, Farnik, Lucy, Motwani, Sumeet Ramesh, Jenner, Erik, Gleave, Adam, Abate, Alessandro
Year of Publication 26.09.2023
Year of Publication 26.09.2023
Get full text
Journal Article
Secret Collusion among Generative AI Agents
Motwani, Sumeet Ramesh, Baranchuk, Mikhail, Strohmeier, Martin, Bolina, Vijay, Torr, Philip H. S, Hammond, Lewis, de Witt, Christian Schroeder
Year of Publication 12.02.2024
Year of Publication 12.02.2024
Get full text
Journal Article
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents
Putta, Pranav, Mills, Edmund, Garg, Naman, Motwani, Sumeet, Finn, Chelsea, Garg, Divyansh, Rafailov, Rafael
Published in arXiv.org (13.08.2024)
Get full text
Published in arXiv.org (13.08.2024)
Paper