On the Modeling Capabilities of Large Language Models for Sequential Decision Making

Large pretrained models are showing increasingly better performance in reasoning and planning tasks across different modalities, opening the possibility to leverage them for complex sequential decision making problems. In this paper, we investigate the capabilities of Large Language Models (LLMs) fo...

Full description

Saved in:

Bibliographic Details
Main Authors	Klissarov, Martin, Hjelm, Devon, Toshev, Alexander, Mazoure, Bogdan
Format	Journal Article
Language	English
Published	07.10.2024
Subjects	Computer Science - Artificial Intelligence
Online Access	Get full text

Cover

Loading…

Abstract	Large pretrained models are showing increasingly better performance in reasoning and planning tasks across different modalities, opening the possibility to leverage them for complex sequential decision making problems. In this paper, we investigate the capabilities of Large Language Models (LLMs) for reinforcement learning (RL) across a diversity of interactive domains. We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly, by first generating reward models to train an agent with RL. Our results show that, even without task-specific fine-tuning, LLMs excel at reward modeling. In particular, crafting rewards through artificial intelligence (AI) feedback yields the most generally applicable approach and can enhance performance by improving credit assignment and exploration. Finally, in environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities while mitigating catastrophic forgetting, further broadening their utility in sequential decision-making tasks.
AbstractList	Large pretrained models are showing increasingly better performance in reasoning and planning tasks across different modalities, opening the possibility to leverage them for complex sequential decision making problems. In this paper, we investigate the capabilities of Large Language Models (LLMs) for reinforcement learning (RL) across a diversity of interactive domains. We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly, by first generating reward models to train an agent with RL. Our results show that, even without task-specific fine-tuning, LLMs excel at reward modeling. In particular, crafting rewards through artificial intelligence (AI) feedback yields the most generally applicable approach and can enhance performance by improving credit assignment and exploration. Finally, in environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities while mitigating catastrophic forgetting, further broadening their utility in sequential decision-making tasks.
Author	Hjelm, Devon Toshev, Alexander Mazoure, Bogdan Klissarov, Martin
Author_xml	– sequence: 1 givenname: Martin surname: Klissarov fullname: Klissarov, Martin – sequence: 2 givenname: Devon surname: Hjelm fullname: Hjelm, Devon – sequence: 3 givenname: Alexander surname: Toshev fullname: Toshev, Alexander – sequence: 4 givenname: Bogdan surname: Mazoure fullname: Mazoure, Bogdan
BackLink	https://doi.org/10.48550/arXiv.2410.05656$$DView paper in arXiv
BookMark	eNqFjr0OwjAQgzPAwN8DMHEvQCnQVOwFxEDFQPfqgGs4EZKStAjenlKxs9iS9cl2X3SMNSTEeB4G0UrKcIbuxc9gETVBKGMZ90R2MFBdCVJ7Ic1GQYIlnlhzxeTBFrBHp6hRo2pUP85DYR0c6VGTqRg1rOnMnq2BFG9NyVB0C9SeRj8fiMl2kyW7abufl47v6N7590fe_lj-Jz4JlD8j
ContentType	Journal Article
Copyright	http://creativecommons.org/licenses/by/4.0
Copyright_xml	– notice: http://creativecommons.org/licenses/by/4.0
DBID	AKY GOX
DOI	10.48550/arxiv.2410.05656
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2410_05656
GroupedDBID	AKY GOX
ID	FETCH-arxiv_primary_2410_056563
IEDL.DBID	GOX
IngestDate	Fri Oct 11 20:38:53 EDT 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-arxiv_primary_2410_056563
OpenAccessLink	https://arxiv.org/abs/2410.05656
ParticipantIDs	arxiv_primary_2410_05656
PublicationCentury	2000
PublicationDate	2024-10-07
PublicationDateYYYYMMDD	2024-10-07
PublicationDate_xml	– month: 10 year: 2024 text: 2024-10-07 day: 07
PublicationDecade	2020
PublicationYear	2024
Score	3.8696527
SecondaryResourceType	preprint
Snippet	Large pretrained models are showing increasingly better performance in reasoning and planning tasks across different modalities, opening the possibility to...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Artificial Intelligence
Title	On the Modeling Capabilities of Large Language Models for Sequential Decision Making
URI	https://arxiv.org/abs/2410.05656
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NSwMxEB3anryIolK_5-A1us0mu92jVGsRaw9W2NuSL0GQKm0Vf76TyYpeeskhGcKQEN5L8mYG4CKXhHHOG2FkoYWqvBJ2IL0gLDZFnjklXXzQnz4Wk2d1X-u6A_gbC2OW369fKT-wXV0RvGSXWeQcXehKGSVbd7M6fU5yKq7W_s-OOCZ3_QOJ8Q5st-wOr9N27EInLPZgPlsg0SyMZcdi8DeOCKBYk0q3VHx_wYeoxqY2vRwmuxUSm8QnFjrTIXzDm7YYDk65ftQ-nI9v56OJYD-aj5Q0ookuNuxifgA9utqHPqCk4-O0Kq0hWh_KorL5ILihdt4bV-rqEPqbZjnaPHQMW5KglyVn5Qn01svPcErQubZnvH4_Rk9yqg
link.rule.ids	228,230,783,888
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=On+the+Modeling+Capabilities+of+Large+Language+Models+for+Sequential+Decision+Making&rft.au=Klissarov%2C+Martin&rft.au=Hjelm%2C+Devon&rft.au=Toshev%2C+Alexander&rft.au=Mazoure%2C+Bogdan&rft.date=2024-10-07&rft_id=info:doi/10.48550%2Farxiv.2410.05656&rft.externalDocID=2410_05656