On the Modeling Capabilities of Large Language Models for Sequential Decision Making

Large pretrained models are showing increasingly better performance in reasoning and planning tasks across different modalities, opening the possibility to leverage them for complex sequential decision making problems. In this paper, we investigate the capabilities of Large Language Models (LLMs) fo...

Full description

Saved in:
Bibliographic Details
Main Authors Klissarov, Martin, Hjelm, Devon, Toshev, Alexander, Mazoure, Bogdan
Format Journal Article
LanguageEnglish
Published 07.10.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Large pretrained models are showing increasingly better performance in reasoning and planning tasks across different modalities, opening the possibility to leverage them for complex sequential decision making problems. In this paper, we investigate the capabilities of Large Language Models (LLMs) for reinforcement learning (RL) across a diversity of interactive domains. We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly, by first generating reward models to train an agent with RL. Our results show that, even without task-specific fine-tuning, LLMs excel at reward modeling. In particular, crafting rewards through artificial intelligence (AI) feedback yields the most generally applicable approach and can enhance performance by improving credit assignment and exploration. Finally, in environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities while mitigating catastrophic forgetting, further broadening their utility in sequential decision-making tasks.
AbstractList Large pretrained models are showing increasingly better performance in reasoning and planning tasks across different modalities, opening the possibility to leverage them for complex sequential decision making problems. In this paper, we investigate the capabilities of Large Language Models (LLMs) for reinforcement learning (RL) across a diversity of interactive domains. We evaluate their ability to produce decision-making policies, either directly, by generating actions, or indirectly, by first generating reward models to train an agent with RL. Our results show that, even without task-specific fine-tuning, LLMs excel at reward modeling. In particular, crafting rewards through artificial intelligence (AI) feedback yields the most generally applicable approach and can enhance performance by improving credit assignment and exploration. Finally, in environments with unfamiliar dynamics, we explore how fine-tuning LLMs with synthetic data can significantly improve their reward modeling capabilities while mitigating catastrophic forgetting, further broadening their utility in sequential decision-making tasks.
Author Hjelm, Devon
Toshev, Alexander
Mazoure, Bogdan
Klissarov, Martin
Author_xml – sequence: 1
  givenname: Martin
  surname: Klissarov
  fullname: Klissarov, Martin
– sequence: 2
  givenname: Devon
  surname: Hjelm
  fullname: Hjelm, Devon
– sequence: 3
  givenname: Alexander
  surname: Toshev
  fullname: Toshev, Alexander
– sequence: 4
  givenname: Bogdan
  surname: Mazoure
  fullname: Mazoure, Bogdan
BackLink https://doi.org/10.48550/arXiv.2410.05656$$DView paper in arXiv
BookMark eNqFjr0OwjAQgzPAwN8DMHEvQCnQVOwFxEDFQPfqgGs4EZKStAjenlKxs9iS9cl2X3SMNSTEeB4G0UrKcIbuxc9gETVBKGMZ90R2MFBdCVJ7Ic1GQYIlnlhzxeTBFrBHp6hRo2pUP85DYR0c6VGTqRg1rOnMnq2BFG9NyVB0C9SeRj8fiMl2kyW7abufl47v6N7590fe_lj-Jz4JlD8j
ContentType Journal Article
Copyright http://creativecommons.org/licenses/by/4.0
Copyright_xml – notice: http://creativecommons.org/licenses/by/4.0
DBID AKY
GOX
DOI 10.48550/arxiv.2410.05656
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2410_05656
GroupedDBID AKY
GOX
ID FETCH-arxiv_primary_2410_056563
IEDL.DBID GOX
IngestDate Fri Oct 11 20:38:53 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-arxiv_primary_2410_056563
OpenAccessLink https://arxiv.org/abs/2410.05656
ParticipantIDs arxiv_primary_2410_05656
PublicationCentury 2000
PublicationDate 2024-10-07
PublicationDateYYYYMMDD 2024-10-07
PublicationDate_xml – month: 10
  year: 2024
  text: 2024-10-07
  day: 07
PublicationDecade 2020
PublicationYear 2024
Score 3.8696527
SecondaryResourceType preprint
Snippet Large pretrained models are showing increasingly better performance in reasoning and planning tasks across different modalities, opening the possibility to...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Artificial Intelligence
Title On the Modeling Capabilities of Large Language Models for Sequential Decision Making
URI https://arxiv.org/abs/2410.05656
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV1NSwMxEB3anryIolK_5-A1us0mu92jVGsRaw9W2NuSL0GQKm0Vf76TyYpeeskhGcKQEN5L8mYG4CKXhHHOG2FkoYWqvBJ2IL0gLDZFnjklXXzQnz4Wk2d1X-u6A_gbC2OW369fKT-wXV0RvGSXWeQcXehKGSVbd7M6fU5yKq7W_s-OOCZ3_QOJ8Q5st-wOr9N27EInLPZgPlsg0SyMZcdi8DeOCKBYk0q3VHx_wYeoxqY2vRwmuxUSm8QnFjrTIXzDm7YYDk65ftQ-nI9v56OJYD-aj5Q0ookuNuxifgA9utqHPqCk4-O0Kq0hWh_KorL5ILihdt4bV-rqEPqbZjnaPHQMW5KglyVn5Qn01svPcErQubZnvH4_Rk9yqg
link.rule.ids 228,230,783,888
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=On+the+Modeling+Capabilities+of+Large+Language+Models+for+Sequential+Decision+Making&rft.au=Klissarov%2C+Martin&rft.au=Hjelm%2C+Devon&rft.au=Toshev%2C+Alexander&rft.au=Mazoure%2C+Bogdan&rft.date=2024-10-07&rft_id=info:doi/10.48550%2Farxiv.2410.05656&rft.externalDocID=2410_05656