Target Networks and Over-parameterization Stabilize Off-policy Bootstrapping with Function Approximation
Che, Fengdi, Xiao, Chenjun, Mei, Jincheng, Dai, Bo, Gummadi, Ramki, Ramirez, Oscar A, Harris, Christopher K, Mahmood, A. Rupam, Schuurmans, Dale
Year of Publication 31.05.2024
Year of Publication 31.05.2024
Get full text
Journal Article
On the Optimality of Batch Policy Optimization Algorithms
Xiao, Chenjun, Wu, Yifan, Lattimore, Tor, Dai, Bo, Mei, Jincheng, Li, Lihong, Szepesvari, Csaba, Schuurmans, Dale
Year of Publication 06.04.2021
Year of Publication 06.04.2021
Get full text
Journal Article
Faster WIND: Accelerating Iterative Best-of-\(N\) Distillation for LLM Alignment
Yang, Tong, Mei, Jincheng, Dai, Hanjun, Wen, Zixin, Cen, Shicong, Schuurmans, Dale, Chi, Yuejie, Dai, Bo
Published in arXiv.org (28.10.2024)
Get full text
Published in arXiv.org (28.10.2024)
Paper
Regularization and Variance-Weighted Regression Achieves Minimax Optimality in Linear MDPs: Theory and Practice
Kitamura, Toshinori, Kozuno, Tadashi, Tang, Yunhao, Vieillard, Nino, Valko, Michal, Yang, Wenhao, Mei, Jincheng, Ménard, Pierre, Azar, Mohammad Gheshlaghi, Munos, Rémi, Pietquin, Olivier, Geist, Matthieu, Szepesvári, Csaba, Kumagai, Wataru, Matsuo, Yutaka
Year of Publication 22.05.2023
Year of Publication 22.05.2023
Get full text
Journal Article
KL-Entropy-Regularized RL with a Generative Model is Minimax Optimal
Kozuno, Tadashi, Yang, Wenhao, Vieillard, Nino, Kitamura, Toshinori, Tang, Yunhao, Mei, Jincheng, Ménard, Pierre, Azar, Mohammad Gheshlaghi, Valko, Michal, Munos, Rémi, Pietquin, Olivier, Geist, Matthieu, Szepesvári, Csaba
Year of Publication 27.05.2022
Year of Publication 27.05.2022
Get full text
Journal Article
Understanding and Mitigating the Limitations of Prioritized Experience Replay
Pan, Yangchen, Mei, Jincheng, Farahmand, Amir-massoud, White, Martha, Yao, Hengshuai, Rohani, Mohsen, Luo, Jun
Year of Publication 18.07.2020
Year of Publication 18.07.2020
Get full text
Journal Article
Stochastic Gradient Succeeds for Bandits
Mei, Jincheng, Zhong, Zixin, Dai, Bo, Agarwal, Alekh, Szepesvari, Csaba, Schuurmans, Dale
Published in arXiv.org (27.02.2024)
Get full text
Published in arXiv.org (27.02.2024)
Paper
On the Global Convergence Rates of Softmax Policy Gradient Methods
Mei, Jincheng, Xiao, Chenjun, Szepesvari, Csaba, Schuurmans, Dale
Published in arXiv.org (02.06.2022)
Get full text
Published in arXiv.org (02.06.2022)
Paper
Leveraging Non-uniformity in First-order Non-convex Optimization
Mei, Jincheng, Gao, Yue, Dai, Bo, Szepesvari, Csaba, Schuurmans, Dale
Published in arXiv.org (02.06.2022)
Get full text
Published in arXiv.org (02.06.2022)
Paper