Q-Adapter: Training Your LLM Adapter as a Residual Q-Function
Yi-Chen, Li, Zhang, Fuxiang, Qiu, Wenjie, Yuan, Lei, Jia, Chengxing, Zhang, Zongzhang, Yang, Yu
Published in arXiv.org (04.07.2024)
Get full text
Published in arXiv.org (04.07.2024)
Paper
Policy Regularization with Dataset Constraint for Offline Reinforcement Learning
Ran, Yuhang, Yi-Chen, Li, Zhang, Fuxiang, Zhang, Zongzhang, Yang, Yu
Published in arXiv.org (15.08.2023)
Get full text
Published in arXiv.org (15.08.2023)
Paper