LLMs for Relational Reasoning: How Far are We?

Large language models (LLMs) have revolutionized many areas (e.g. natural language processing, software engineering, etc.) by achieving state-of-the-art performance on extensive downstream tasks. Aiming to achieve robust and general artificial intelligence, there has been a surge of interest in inve...

Full description

Saved in:
Bibliographic Details
Published in2024 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code) pp. 119 - 126
Main Authors Li, Zhiming, Cao, Yushi, Xu, Xiufeng, Jiang, Junzhe, Liu, Xu, Teo, Yon Shin, Lin, Shang-Wei, Liu, Yang
Format Conference Proceeding
LanguageEnglish
Published ACM 20.04.2024
Subjects
Online AccessGet full text
DOI10.1145/3643795.3648387

Cover

Abstract Large language models (LLMs) have revolutionized many areas (e.g. natural language processing, software engineering, etc.) by achieving state-of-the-art performance on extensive downstream tasks. Aiming to achieve robust and general artificial intelligence, there has been a surge of interest in investigating the reasoning ability of the LLMs. Whereas the textual and numerical reasoning benchmarks adopted by previous works are rather shallow and simple, it is hard to conclude that the LLMs possess strong reasoning ability by merely achieving positive results on these benchmarks. Recent efforts have demonstrated that the LLMs are poor at solving sequential decision-making problems that require common-sense planning by evaluating their performance on the reinforcement learning benchmarks. In this work, we conduct an in-depth assessment of several state-of-the-art LLMs' reasoning ability based on the inductive logic programming (ILP) benchmark, which is broadly recognized as a representative and challenging measurement for evaluating logic program induction/synthesis systems as it requires inducing strict cause-effect logic to achieve robust deduction on independent and identically distributed (IID) and out-of-distribution (OOD) test samples. Our evaluations illustrate that compared with the neural program induction systems which are much smaller in model size, the state-of-the-art LLMs are much poorer in terms of reasoning ability by achieving much lower performance and generalization using either natural language prompting or truth-value matrix prompting. 1
AbstractList Large language models (LLMs) have revolutionized many areas (e.g. natural language processing, software engineering, etc.) by achieving state-of-the-art performance on extensive downstream tasks. Aiming to achieve robust and general artificial intelligence, there has been a surge of interest in investigating the reasoning ability of the LLMs. Whereas the textual and numerical reasoning benchmarks adopted by previous works are rather shallow and simple, it is hard to conclude that the LLMs possess strong reasoning ability by merely achieving positive results on these benchmarks. Recent efforts have demonstrated that the LLMs are poor at solving sequential decision-making problems that require common-sense planning by evaluating their performance on the reinforcement learning benchmarks. In this work, we conduct an in-depth assessment of several state-of-the-art LLMs' reasoning ability based on the inductive logic programming (ILP) benchmark, which is broadly recognized as a representative and challenging measurement for evaluating logic program induction/synthesis systems as it requires inducing strict cause-effect logic to achieve robust deduction on independent and identically distributed (IID) and out-of-distribution (OOD) test samples. Our evaluations illustrate that compared with the neural program induction systems which are much smaller in model size, the state-of-the-art LLMs are much poorer in terms of reasoning ability by achieving much lower performance and generalization using either natural language prompting or truth-value matrix prompting. 1
Author Cao, Yushi
Jiang, Junzhe
Liu, Xu
Teo, Yon Shin
Xu, Xiufeng
Lin, Shang-Wei
Liu, Yang
Li, Zhiming
Author_xml – sequence: 1
  givenname: Zhiming
  surname: Li
  fullname: Li, Zhiming
  email: zhiming001@e.ntu.edu.sg
  organization: Nanyang Technological University,Continental-NTU Corporate Lab,Singapore
– sequence: 2
  givenname: Yushi
  surname: Cao
  fullname: Cao, Yushi
  email: yushi002@e.ntu.edu.sg
  organization: Nanyang Technological University,Continental-NTU Corporate Lab,Singapore
– sequence: 3
  givenname: Xiufeng
  surname: Xu
  fullname: Xu, Xiufeng
  email: xiufeng001@e.ntu.edu.sg
  organization: Nanyang Technological University,Singapore
– sequence: 4
  givenname: Junzhe
  surname: Jiang
  fullname: Jiang, Junzhe
  email: junzhe.jiang@connect.polyu.hk
  organization: Hong Kong Polytechnic University,Hong Kong
– sequence: 5
  givenname: Xu
  surname: Liu
  fullname: Liu, Xu
  email: liuxu@comp.nus.edu.sg
  organization: National University of Singapore,Singapore
– sequence: 6
  givenname: Yon Shin
  surname: Teo
  fullname: Teo, Yon Shin
  email: yon.shin.teo@continentalcorporation.com
  organization: Continental Automotive Singapore Pte. Ltd.,Singapore
– sequence: 7
  givenname: Shang-Wei
  surname: Lin
  fullname: Lin, Shang-Wei
  email: shang-wei.lin@ntu.edu.sg
  organization: Nanyang Technological University,Continental-NTU Corporate Lab,Singapore
– sequence: 8
  givenname: Yang
  surname: Liu
  fullname: Liu, Yang
  email: yangliu@ntu.edu.sg
  organization: Nanyang Technological University,Continental-NTU Corporate Lab,Singapore
BookMark eNotjsFKxDAQQCMoqGvPXjzkB1onnSTTeBFZXFeoCKJ4XCbdqRRqIu2C-PcW9PTe6fHO1XHKSZS6NFAZY901eosUXLWwwYaOVBEoNBaAwFHAU1XM8xAXr50HrM9U1bZPs-7zpF9k5MOQE4-L8pzTkD5u9DZ_6w1PmifR73J7oU56Hmcp_rlSb5v71_W2bJ8fHtd3bcnG46HEgD7u0Ydu37nIPQcSQwEcxi4ieR-pqyOAEfCxF2YQxkDNMtV0PRpcqau_7iAiu69p-OTpZ2eA0Hpj8Rdl80HT
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
ESBDL
RIE
RIL
DOI 10.1145/3643795.3648387
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Xplore Open Access Journals
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
EISBN 9798400705793
EndPage 126
ExternalDocumentID 10734614
Genre orig-research
GrantInformation_xml – fundername: National Research Foundation
  funderid: 10.13039/501100001321
GroupedDBID 6IE
6IL
ACM
ALMA_UNASSIGNED_HOLDINGS
APO
CBEJK
ESBDL
LHSKQ
RIE
RIL
ID FETCH-LOGICAL-a163t-3936bd369cdc5bafa97e179053bcb3766b7c2b001e06bfeaa0ea39785608cf313
IEDL.DBID RIE
IngestDate Wed Aug 27 03:01:26 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a163t-3936bd369cdc5bafa97e179053bcb3766b7c2b001e06bfeaa0ea39785608cf313
OpenAccessLink https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/document/10734614
PageCount 8
ParticipantIDs ieee_primary_10734614
PublicationCentury 2000
PublicationDate 2024-April-20
PublicationDateYYYYMMDD 2024-04-20
PublicationDate_xml – month: 04
  year: 2024
  text: 2024-April-20
  day: 20
PublicationDecade 2020
PublicationTitle 2024 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code)
PublicationTitleAbbrev LLM4CODE
PublicationYear 2024
Publisher ACM
Publisher_xml – name: ACM
SSID ssib057256032
Score 1.9461334
Snippet Large language models (LLMs) have revolutionized many areas (e.g. natural language processing, software engineering, etc.) by achieving state-of-the-art...
SourceID ieee
SourceType Publisher
StartPage 119
SubjectTerms Benchmark testing
Cognition
Large language models
Logic
Pipelines
Planning
Program Induction
Reinforcement learning
Relational Reasoning
Software engineering
Surges
Title LLMs for Relational Reasoning: How Far are We?
URI https://ieeexplore.ieee.org/document/10734614
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA7akycVK77JwWvWNMkmGy8exFKkLR4s9lbymL0ordQtgr_eSdpVFARPCbnkNck3ycx8Q8hlYhDBM2SZxkcyU7a2DHG9YpxDKZwyPOTcgKOxHkzU_bScboLVcywMAGTnMyhSNdvy4yKs0lcZnnAjlU5pq7dRztbBWq3wlCaBtxQb-p6eKq9kMkrZssCyktXP_CkZPvq7ZNx2vPYaeS5WjS_Cxy9Oxn-PbI90vyP16MMXBu2TLZgfkGI4HL1R1EZp6-rmXrDq3vLX6zUdLN5p3y2pWwJ9gpsumfTvHm8HbJMXgTnUnhomrdQ-Sm1DDKV3tbMGMtGW9MHjhaG9CSJpQ8C1r8E5Dg7VjgrXpwq17MlD0pkv5nBEqBYW4clDNN6qUEVnnA6CRyOd0VHxY9JNk529rqkvZu08T_5oPyU7AlE_mVsEPyOdZrmCc0Ttxl_k3foEnHiT3A
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELVQGWACRBHfeGBNcGPHjlkYEFWAtGJoRbfKdi4LqEVpKiR-PWe3AYGExBQri7_9nn137wi59AoiuId0JPGSHAld6QhxPYsYgzQxQjEXcgMOhjIfi4dJOlkHq4dYGAAIzmcQ-2Kw5Zdzt_RPZbjDFRfSp63eROAX6Spcq10-qfLwzZO1gE9PpFfcm6V0GuM349nPDCoBQPo7ZNhWvfIbeYmXjY3dxy9Vxn-3bZd0v2P16NMXCu2RDZjtk7goBguKfJS2zm7mFYtmER5fr2k-f6d9U1NTA32Gmy4Z9-9Gt3m0zowQGeRPTcQ1l7bkUrvSpdZURisIUlvcOotHhrTKJZ4PAZO2AmMYGCQeGY5P5ire4wekM5vP4JBQmWgEKAulslq4rDTKSJewUnGjZCnYEen6zk7fVuIX07afx3_8vyBb-WhQTIv74eMJ2U6QA3jjS8JOSaepl3CGGN7Y8zBzn5Cdlyk
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE%2FACM+International+Workshop+on+Large+Language+Models+for+Code+%28LLM4Code%29&rft.atitle=LLMs+for+Relational+Reasoning%3A+How+Far+are+We%3F&rft.au=Li%2C+Zhiming&rft.au=Cao%2C+Yushi&rft.au=Xu%2C+Xiufeng&rft.au=Jiang%2C+Junzhe&rft.date=2024-04-20&rft.pub=ACM&rft.spage=119&rft.epage=126&rft_id=info:doi/10.1145%2F3643795.3648387&rft.externalDocID=10734614