WARM: On the Benefits of Weight Averaged Reward Models
Ramé, Alexandre, Vieillard, Nino, Hussenot, Léonard, Dadashi, Robert, Cideron, Geoffrey, Bachem, Olivier, Ferret, Johan
Year of Publication 22.01.2024
Year of Publication 22.01.2024
Get full text
Journal Article
WARP: On the Benefits of Weight Averaged Rewarded Policies
Ramé, Alexandre, Ferret, Johan, Vieillard, Nino, Dadashi, Robert, Hussenot, Léonard, Cedoz, Pierre-Louis, Sessa, Pier Giuseppe, Girgin, Sertan, Douillard, Arthur, Bachem, Olivier
Year of Publication 24.06.2024
Year of Publication 24.06.2024
Get full text
Journal Article
vec2text with Round-Trip Translations
Cideron, Geoffrey, Girgin, Sertan, Raichuk, Anton, Pietquin, Olivier, Bachem, Olivier, Hussenot, Léonard
Year of Publication 14.09.2022
Year of Publication 14.09.2022
Get full text
Journal Article
Learning Energy Networks with Generalized Fenchel-Young Losses
Blondel, Mathieu, Llinares-López, Felipe, Dadashi, Robert, Hussenot, Léonard, Geist, Matthieu
Year of Publication 19.05.2022
Year of Publication 19.05.2022
Get full text
Journal Article
MusicRL: Aligning Music Generation to Human Preferences
Cideron, Geoffrey, Girgin, Sertan, Verzetti, Mauro, Vincent, Damien, Kastelic, Matej, Borsos, Zalán, McWilliams, Brian, Ungureanu, Victor, Bachem, Olivier, Pietquin, Olivier, Geist, Matthieu, Hussenot, Léonard, Zeghidour, Neil, Agostinelli, Andrea
Year of Publication 06.02.2024
Year of Publication 06.02.2024
Get full text
Journal Article
Show me the Way: Intrinsic Motivation from Demonstrations
Hussenot, Léonard, Dadashi, Robert, Geist, Matthieu, Pietquin, Olivier
Year of Publication 23.06.2020
Year of Publication 23.06.2020
Get full text
Journal Article
Primal Wasserstein Imitation Learning
Dadashi, Robert, Hussenot, Léonard, Geist, Matthieu, Pietquin, Olivier
Year of Publication 08.06.2020
Year of Publication 08.06.2020
Get full text
Journal Article
Conditional Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Wang, Kaiwen, Kidambi, Rahul, Sullivan, Ryan, Agarwal, Alekh, Dann, Christoph, Michi, Andrea, Gelmi, Marco, Li, Yunxuan, Gupta, Raghav, Dubey, Avinava, Ramé, Alexandre, Ferret, Johan, Cideron, Geoffrey, Hou, Le, Yu, Hongkun, Ahmed, Amr, Mehta, Aranyak, Hussenot, Léonard, Bachem, Olivier, Leurent, Edouard
Year of Publication 22.07.2024
Year of Publication 22.07.2024
Get full text
Journal Article
BOND: Aligning LLMs with Best-of-N Distillation
Sessa, Pier Giuseppe, Dadashi, Robert, Hussenot, Léonard, Ferret, Johan, Vieillard, Nino, Ramé, Alexandre, Shariari, Bobak, Perrin, Sarah, Friesen, Abe, Cideron, Geoffrey, Girgin, Sertan, Stanczyk, Piotr, Michi, Andrea, Sinopalnikov, Danila, Ramos, Sabela, Héliou, Amélie, Severyn, Aliaksei, Hoffman, Matt, Momchev, Nikola, Bachem, Olivier
Year of Publication 19.07.2024
Year of Publication 19.07.2024
Get full text
Journal Article
Continuous Control with Action Quantization from Demonstrations
Dadashi, Robert, Hussenot, Léonard, Vincent, Damien, Girgin, Sertan, Raichuk, Anton, Geist, Matthieu, Pietquin, Olivier
Year of Publication 19.10.2021
Year of Publication 19.10.2021
Get full text
Journal Article
Offline Reinforcement Learning as Anti-Exploration
Rezaeifar, Shideh, Dadashi, Robert, Vieillard, Nino, Hussenot, Léonard, Bachem, Olivier, Pietquin, Olivier, Geist, Matthieu
Year of Publication 11.06.2021
Year of Publication 11.06.2021
Get full text
Journal Article
Offline Reinforcement Learning with Pseudometric Learning
Dadashi, Robert, Rezaeifar, Shideh, Vieillard, Nino, Hussenot, Léonard, Pietquin, Olivier, Geist, Matthieu
Year of Publication 02.03.2021
Year of Publication 02.03.2021
Get full text
Journal Article
Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Roit, Paul, Ferret, Johan, Shani, Lior, Aharoni, Roee, Cideron, Geoffrey, Dadashi, Robert, Geist, Matthieu, Girgin, Sertan, Hussenot, Léonard, Keller, Orgad, Momchev, Nikola, Ramos, Sabela, Stanczyk, Piotr, Vieillard, Nino, Bachem, Olivier, Elidan, Gal, Hassidim, Avinatan, Pietquin, Olivier, Szpektor, Idan
Year of Publication 31.05.2023
Year of Publication 31.05.2023
Get full text
Journal Article
What Matters for Adversarial Imitation Learning?
Orsini, Manu, Raichuk, Anton, Hussenot, Léonard, Vincent, Damien, Dadashi, Robert, Girgin, Sertan, Geist, Matthieu, Bachem, Olivier, Pietquin, Olivier, Andrychowicz, Marcin
Year of Publication 01.06.2021
Year of Publication 01.06.2021
Get full text
Journal Article
RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning
Ramos, Sabela, Girgin, Sertan, Hussenot, Léonard, Vincent, Damien, Yakubovich, Hanna, Toyama, Daniel, Gergely, Anita, Stanczyk, Piotr, Marinier, Raphael, Harmsen, Jeremiah, Pietquin, Olivier, Momchev, Nikola
Year of Publication 04.11.2021
Year of Publication 04.11.2021
Get full text
Journal Article
WARM: On the Benefits of Weight Averaged Reward Models
Ramé, Alexandre, Vieillard, Nino, Léonard Hussenot, Dadashi, Robert, Cideron, Geoffrey, Bachem, Olivier, Ferret, Johan
Published in arXiv.org (22.01.2024)
Get full text
Published in arXiv.org (22.01.2024)
Paper
Gemma 2: Improving Open Language Models at a Practical Size
Sessa, Pier Giuseppe, Hardin, Cassidy, Bhupatiraju, Surya, Hussenot, Léonard, Shahriari, Bobak, Ramé, Alexandre, Ferret, Johan, Friesen, Abe, Tsitsulin, Anton, Vieillard, Nino, Girgin, Sertan, Hoffman, Matt, Grill, Jean-Bastien, Neyshabur, Behnam, Abdagic, Alvin, Carl, Amanda, Brock, Andy, Paterson, Antonia, Royal, Brandon, Choquette-Choo, Christopher A, Weinberger, David, Vijaykumar, Dimple, Herbison, Dustin, Bandy, Elisa, Wang, Emma, Noland, Eric, Moreira, Erica, Senter, Evan, Eltyshev, Evgenii, Rasskin, Gabriel, Wei, Gary, Cameron, Glenn, Martins, Gus, Hashemi, Hadi, Klimczak-Plucińska, Hanna, Zhou, Jack, Stanway, Jeff, Chan, Jetha, Becker, Jocelyn, Fernandez, Joe, Gordon, Josh, Lipschultz, Josh, Newlan, Josh, Ji, Ju-yeong, Mohamed, Kareem, Badola, Kartikeya, Black, Kat, Millican, Katie, Greene, Kish, Sjoesund, Lars Lowe, Usui, Lauren, Kilpatrick, Logan, Dixon, Lucas, Reid, Machel, Iverson, Mark, Miller, Matt, Rahtz, Matthew, Risdal, Meg, Rahman, Mofi, Khatwani, Mohit, Bardoliwalla, Nenshad, Dumai, Neta, Botarda, Pankil, Barham, Paul, Culliton, Phil, Comanescu, Ramona, Jana, Reena, Agarwal, Rishabh, Saadat, Samaneh, Cogan, Sarah, Perrin, Sarah, Arnold, Sébastien M. R, Krause, Sebastian, Garg, Shruti, Sheth, Shruti, Chan, Susan, Yu, Ting, Kocisky, Tomas, Jain, Vihan, Yadav, Vikas, Meshram, Vilobh, Dharmadhikari, Vishal, Barkley, Warren, Shen, Zhe, Gong, Zhitao, Kirk, Phoebe, Rao, Anand, Warkentin, Tris, Ghahramani, Zoubin, Hadsell, Raia, Banks, Jeanine, Dragan, Anca, Vinyals, Oriol, Dean, Jeff, Kavukcuoglu, Koray, Farabet, Clement, Fiedel, Noah, Kenealy, Kathleen, Dadashi, Robert, Andreev, Alek
Year of Publication 31.07.2024
Year of Publication 31.07.2024
Get full text
Journal Article
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Botev, Aleksandar, De, Soham, Smith, Samuel L, Fernando, Anushan, Muraru, George-Cristian, Haroun, Ruba, Berrada, Leonard, Pascanu, Razvan, Sessa, Pier Giuseppe, Dadashi, Robert, Hussenot, Léonard, Ferret, Johan, Girgin, Sertan, Bachem, Olivier, Andreev, Alek, Kenealy, Kathleen, Mesnard, Thomas, Hardin, Cassidy, Bhupatiraju, Surya, Pathak, Shreya, Sifre, Laurent, Rivière, Morgane, Kale, Mihir Sanjay, Love, Juliette, Tafti, Pouya, Joulin, Armand, Fiedel, Noah, Senter, Evan, Chen, Yutian, Srinivasan, Srivatsan, Desjardins, Guillaume, Budden, David, Doucet, Arnaud, Vikram, Sharad, Paszke, Adam, Gale, Trevor, Borgeaud, Sebastian, Chen, Charlie, Brock, Andy, Paterson, Antonia, Brennan, Jenny, Risdal, Meg, Gundluru, Raj, Devanathan, Nesh, Mooney, Paul, Chauhan, Nilay, Culliton, Phil, Martins, Luiz Gustavo, Bandy, Elisa, Huntsperger, David, Cameron, Glenn, Zucker, Arthur, Warkentin, Tris, Peran, Ludovic, Giang, Minh, Ghahramani, Zoubin, Farabet, Clément, Kavukcuoglu, Koray, Hassabis, Demis, Hadsell, Raia, Teh, Yee Whye, de Frietas, Nando
Year of Publication 11.04.2024
Year of Publication 11.04.2024
Get full text
Journal Article