WARM: On the Benefits of Weight Averaged Reward Models
Ramé, Alexandre, Vieillard, Nino, Léonard Hussenot, Dadashi, Robert, Cideron, Geoffrey, Bachem, Olivier, Ferret, Johan
Published in arXiv.org (22.01.2024)
Published in arXiv.org (22.01.2024)
Get full text
Paper
Journal Article
Primal Wasserstein Imitation Learning
Dadashi, Robert, Léonard Hussenot, Geist, Matthieu, Pietquin, Olivier
Published in arXiv.org (17.03.2021)
Published in arXiv.org (17.03.2021)
Get full text
Paper
Journal Article
Learning Energy Networks with Generalized Fenchel-Young Losses
Blondel, Mathieu, Llinares-López, Felipe, Dadashi, Robert, Léonard Hussenot, Geist, Matthieu
Published in arXiv.org (12.10.2022)
Published in arXiv.org (12.10.2022)
Get full text
Paper
Journal Article
Show me the Way: Intrinsic Motivation from Demonstrations
Léonard Hussenot, Dadashi, Robert, Geist, Matthieu, Pietquin, Olivier
Published in arXiv.org (13.01.2021)
Published in arXiv.org (13.01.2021)
Get full text
Paper
Journal Article
WARP: On the Benefits of Weight Averaged Rewarded Policies
Ramé, Alexandre, Ferret, Johan, Vieillard, Nino, Dadashi, Robert, Léonard Hussenot, Pierre-Louis Cedoz, Sessa, Pier Giuseppe, Girgin, Sertan, Douillard, Arthur, Bachem, Olivier
Published in arXiv.org (24.06.2024)
Published in arXiv.org (24.06.2024)
Get full text
Paper
Journal Article
vec2text with Round-Trip Translations
Cideron, Geoffrey, Girgin, Sertan, Raichuk, Anton, Pietquin, Olivier, Bachem, Olivier, Léonard Hussenot
Published in arXiv.org (14.09.2022)
Published in arXiv.org (14.09.2022)
Get full text
Paper
Journal Article
Continuous Control with Action Quantization from Demonstrations
Dadashi, Robert, Léonard Hussenot, Vincent, Damien, Girgin, Sertan, Raichuk, Anton, Geist, Matthieu, Pietquin, Olivier
Published in arXiv.org (03.06.2022)
Published in arXiv.org (03.06.2022)
Get full text
Paper
Journal Article
CopyCAT: Taking Control of Neural Policies with Constant Attacks
Get full text
Paper
Journal Article
Offline Reinforcement Learning as Anti-Exploration
Rezaeifar, Shideh, Dadashi, Robert, Vieillard, Nino, Léonard Hussenot, Bachem, Olivier, Pietquin, Olivier, Geist, Matthieu
Published in arXiv.org (11.06.2021)
Published in arXiv.org (11.06.2021)
Get full text
Paper
Journal Article
MusicRL: Aligning Music Generation to Human Preferences
Cideron, Geoffrey, Girgin, Sertan, Verzetti, Mauro, Vincent, Damien, Kastelic, Matej, Borsos, Zalán, McWilliams, Brian, Ungureanu, Victor, Bachem, Olivier, Pietquin, Olivier, Geist, Matthieu, Léonard Hussenot, Zeghidour, Neil, Agostinelli, Andrea
Published in arXiv.org (06.02.2024)
Published in arXiv.org (06.02.2024)
Get full text
Paper
Journal Article
Offline Reinforcement Learning with Pseudometric Learning
Dadashi, Robert, Rezaeifar, Shideh, Vieillard, Nino, Léonard Hussenot, Pietquin, Olivier, Geist, Matthieu
Published in arXiv.org (02.06.2021)
Published in arXiv.org (02.06.2021)
Get full text
Paper
Journal Article
Factually Consistent Summarization via Reinforcement Learning with Textual Entailment Feedback
Roit, Paul, Ferret, Johan, Shani, Lior, Aharoni, Roee, Cideron, Geoffrey, Dadashi, Robert, Geist, Matthieu, Girgin, Sertan, Léonard Hussenot, Keller, Orgad, Momchev, Nikola, Ramos, Sabela, Stanczyk, Piotr, Vieillard, Nino, Bachem, Olivier, Elidan, Gal, Hassidim, Avinatan, Pietquin, Olivier, Szpektor, Idan
Published in arXiv.org (31.05.2023)
Published in arXiv.org (31.05.2023)
Get full text
Paper
Journal Article
RLDS: an Ecosystem to Generate, Share and Use Datasets in Reinforcement Learning
Ramos, Sabela, Girgin, Sertan, Léonard Hussenot, Vincent, Damien, Yakubovich, Hanna, Toyama, Daniel, Gergely, Anita, Stanczyk, Piotr, Marinier, Raphael, Harmsen, Jeremiah, Pietquin, Olivier, Momchev, Nikola
Published in arXiv.org (04.11.2021)
Published in arXiv.org (04.11.2021)
Get full text
Paper
Journal Article
What Matters for Adversarial Imitation Learning?
Orsini, Manu, Raichuk, Anton, Léonard Hussenot, Vincent, Damien, Dadashi, Robert, Girgin, Sertan, Geist, Matthieu, Bachem, Olivier, Pietquin, Olivier, Andrychowicz, Marcin
Published in arXiv.org (01.06.2021)
Published in arXiv.org (01.06.2021)
Get full text
Paper
Journal Article
RecurrentGemma: Moving Past Transformers for Efficient Open Language Models
Botev, Aleksandar, De, Soham, Smith, Samuel L, Anushan Fernando, George-Cristian Muraru, Haroun, Ruba, Berrada, Leonard, Pascanu, Razvan, Sessa, Pier Giuseppe, Dadashi, Robert, Léonard Hussenot, Ferret, Johan, Girgin, Sertan, Bachem, Olivier, Andreev, Alek, Kenealy, Kathleen, Mesnard, Thomas, Cassidy Hardin, Bhupatiraju, Surya, Pathak, Shreya, Sifre, Laurent, Rivière, Morgane, Mihir Sanjay Kale, Love, Juliette, Tafti, Pouya, Joulin, Armand, Fiedel, Noah, Senter, Evan, Chen, Yutian, Srinivasan, Srivatsan, Desjardins, Guillaume, Budden, David, Doucet, Arnaud, Vikram, Sharad, Paszke, Adam, Gale, Trevor, Borgeaud, Sebastian, Chen, Charlie, Brock, Andy, Paterson, Antonia, Brennan, Jenny, Risdal, Meg, Gundluru, Raj, Devanathan, Nesh, Mooney, Paul, Chauhan, Nilay, Culliton, Phil, Martins, Luiz Gustavo, Bandy, Elisa, Huntsperger, David, Cameron, Glenn, Zucker, Arthur, Warkentin, Tris, Peran, Ludovic, Giang, Minh, Ghahramani, Zoubin, Farabet, Clément, Kavukcuoglu, Koray, Hassabis, Demis, Hadsell, Raia, Teh, Yee Whye, Nando de Frietas
Published in arXiv.org (28.08.2024)
Published in arXiv.org (28.08.2024)
Get full text
Paper
Journal Article
Gemma 2: Improving Open Language Models at a Practical Size
Team, Gemma, Léonard Hussenot, Shahriari, Bobak, Ramé, Alexandre, Ferret, Johan, Friesen, Abe, Tsitsulin, Anton, Vieillard, Nino, Girgin, Sertan, Hoffman, Matt, Neyshabur, Behnam, Abdagic, Alvin, Carl, Amanda, Brock, Andy, Paterson, Antonia, Royal, Brandon, Weinberger, David, Vijaykumar, Dimple, Herbison, Dustin, Bandy, Elisa, Wang, Emma, Noland, Eric, Moreira, Erica, Senter, Evan, Eltyshev, Evgenii, Rasskin, Gabriel, Wei, Gary, Cameron, Glenn, Martins, Gus, Hashemi, Hadi, Hanna Klimczak-Plucińska, Zhou, Jack, Stanway, Jeff, Chan, Jetha, Jin Peng Zhou, Becker, Jocelyn, Fernandez, Joe, Joost van Amersfoort, Gordon, Josh, Lipschultz, Josh, Newlan, Josh, Badola, Kartikeya, Black, Kat, Millican, Katie, Greene, Kish, Lars Lowe Sjoesund, Usui, Lauren, Lilly McNealus, Livio Baldini Soares, Kilpatrick, Logan, Dixon, Lucas, Reid, Machel, Iverson, Mark, Miller, Matt, Rahtz, Matthew, Risdal, Meg, Rahman, Mofi, Khatwani, Mohit, Bardoliwalla, Nenshad, Dumai, Neta, Botarda, Pankil, Barham, Paul, Culliton, Phil, Comanescu, Ramona, Jana, Reena, Agarwal, Rishabh, Samaneh Saadat, Sara Mc Carthy, Perrin, Sarah, Arnold, Sébastien M R, Garg, Shruti, Chan, Susan, Eccles, Tom, Hennigan, Tom, Kocisky, Tomas, Doshi, Tulsee, Jain, Vihan, Yadav, Vikas, Dharmadhikari, Vishal, Barkley, Warren, Ye, Wenming, Han, Woohyun, Xu, Xiang, Shen, Zhe, Gong, Zhitao, Zichuan Wei, Rao, Anand, Peran, Ludovic, Warkentin, Tris, Collins, Eli, Barral, Joelle, Ghahramani, Zoubin, Dragan, Anca, Vinyals, Oriol, Kavukcuoglu, Koray, Clement Farabet, Fiedel, Noah, Kenealy, Kathleen, Dadashi, Robert, Andreev, Alek
Published in arXiv.org (02.08.2024)
Published in arXiv.org (02.08.2024)
Get full text
Paper
Journal Article
Gemma: Open Models Based on Gemini Research and Technology
Team, Gemma, Mesnard, Thomas, Cassidy Hardin, Dadashi, Robert, Bhupatiraju, Surya, Pathak, Shreya, Sifre, Laurent, Rivière, Morgane, Mihir Sanjay Kale, Love, Juliette, Tafti, Pouya, Léonard Hussenot, Sessa, Pier Giuseppe, Chowdhery, Aakanksha, Roberts, Adam, Barua, Aditya, Castro-Ros, Alex, Slone, Ambrose, Tacchetti, Andrea, Bulanova, Anna, Paterson, Antonia, Tsai, Beth, Shahriari, Bobak, Charline Le Lan, Choquette-Choo, Christopher A, Crepy, Clément, Cer, Daniel, Ippolito, Daphne, Reid, David, Buchatskaya, Elena, Noland, Eric, Geng, Yan, Tucker, George, George-Christian Muraru, Rozhdestvenskiy, Grigory, Michalewski, Henryk, Tenney, Ian, Austin, Jacob, Keeling, James, Jean-Baptiste Lespiau, Stanway, Jeff, Brennan, Jenny, Chen, Jeremy, Ferret, Johan, Chiu, Justin, Mao-Jones, Justin, Lee, Katherine, Yu, Kathy, Millican, Katie, Lars Lowe Sjoesund, Lee, Lisa, Dixon, Lucas, Reid, Machel, Mikuła, Maciej, Wirth, Mateo, Sharman, Michael, Chinaev, Nikolai, Thain, Nithum, Bachem, Olivier, Wahltinez, Oscar, Bailey, Paige, Michel, Paul, Yotov, Petko, Chaabouni, Rahma, Comanescu, Ramona, Jana, Reena, Rohan, Anil, McIlroy, Ross, Smith, Samuel L, Borgeaud, Sebastian, Girgin, Sertan, Sholto Douglas, Pandya, Shree, Shakeri, Siamak, De, Soham, Klimenko, Ted, Hennigan, Tom, Feinberg, Vlad, Stokowiec, Wojciech, Yu-hui, Chen, Zafarali Ahmed, Gong, Zhitao, Warkentin, Tris, Peran, Ludovic, Giang, Minh, Farabet, Clément, Vinyals, Oriol, Dean, Jeff, Kavukcuoglu, Koray, Hassabis, Demis, Ghahramani, Zoubin, Eck, Douglas, Barral, Joelle, Pereira, Fernando, Collins, Eli, Joulin, Armand, Fiedel, Noah, Senter, Evan, Andreev, Alek, Kenealy, Kathleen
Published in arXiv.org (16.04.2024)
Published in arXiv.org (16.04.2024)
Get full text
Paper
Journal Article
What Matters In On-Policy Reinforcement Learning? A Large-Scale Empirical Study
Andrychowicz, Marcin, Raichuk, Anton, Stańczyk, Piotr, Orsini, Manu, Girgin, Sertan, Marinier, Raphael, Léonard Hussenot, Geist, Matthieu, Pietquin, Olivier, Michalski, Marcin, Gelly, Sylvain, Bachem, Olivier
Published in arXiv.org (10.06.2020)
Published in arXiv.org (10.06.2020)
Get full text
Paper
Journal Article
Acme: A Research Framework for Distributed Reinforcement Learning
Hoffman, Matthew W, Shahriari, Bobak, Aslanides, John, Barth-Maron, Gabriel, Momchev, Nikola, Sinopalnikov, Danila, Stańczyk, Piotr, Ramos, Sabela, Raichuk, Anton, Vincent, Damien, Léonard Hussenot, Dadashi, Robert, Dulac-Arnold, Gabriel, Orsini, Manu, Jacq, Alexis, Ferret, Johan, Vieillard, Nino, Seyed Kamyar Seyed Ghasemipour, Girgin, Sertan, Pietquin, Olivier, Behbahani, Feryal, Norman, Tamara, Abdolmaleki, Abbas, Cassirer, Albin, Yang, Fan, Baumli, Kate, Henderson, Sarah, Friesen, Abe, Haroun, Ruba, Novikov, Alex, Sergio Gómez Colmenarejo, Cabi, Serkan, Gulcehre, Caglar, Tom Le Paine, Srinivasan, Srivatsan, Cowie, Andrew, Wang, Ziyu, Piot, Bilal, Nando de Freitas
Published in arXiv.org (20.09.2022)
Published in arXiv.org (20.09.2022)
Get full text
Paper
Journal Article
Conditioned Language Policy: A General Framework for Steerable Multi-Objective Finetuning
Wang, Kaiwen, Kidambi, Rahul, Sullivan, Ryan, Agarwal, Alekh, Dann, Christoph, Michi, Andrea, Gelmi, Marco, Li, Yunxuan, Gupta, Raghav, Dubey, Avinava, Ramé, Alexandre, Ferret, Johan, Cideron, Geoffrey, Hou, Le, Yu, Hongkun, Ahmed, Amr, Mehta, Aranyak, Hussenot, Léonard, Bachem, Olivier, Leurent, Edouard
Year of Publication 22.07.2024
Year of Publication 22.07.2024
Get full text
Journal Article