Multi-turn Reinforcement Learning from Preference Human Feedback
Shani, Lior, Rosenberg, Aviv, Cassel, Asaf, Lang, Oran, Calandriello, Daniele, Zipori, Avital, Noga, Hila, Keller, Orgad, Piot, Bilal, Szpektor, Idan, Hassidim, Avinatan, Matias, Yossi, Munos, Rémi
Year of Publication 23.05.2024
Year of Publication 23.05.2024
Get full text
Journal Article
Multi-turn Reinforcement Learning from Preference Human Feedback
Shani, Lior, Rosenberg, Aviv, Cassel, Asaf, Lang, Oran, Calandriello, Daniele, Zipori, Avital, Noga, Hila, Keller, Orgad, Piot, Bilal, Szpektor, Idan, Hassidim, Avinatan, Matias, Yossi, Munos, Rémi
Published in arXiv.org (23.05.2024)
Get full text
Published in arXiv.org (23.05.2024)
Paper