LLMs for Relational Reasoning: How Far are We?

Large language models (LLMs) have revolutionized many areas (e.g. natural language processing, software engineering, etc.) by achieving state-of-the-art performance on extensive downstream tasks. Aiming to achieve robust and general artificial intelligence, there has been a surge of interest in inve...

Full description

Saved in:

Bibliographic Details
Published in	2024 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code) pp. 119 - 126
Main Authors	Li, Zhiming, Cao, Yushi, Xu, Xiufeng, Jiang, Junzhe, Liu, Xu, Teo, Yon Shin, Lin, Shang-Wei, Liu, Yang
Format	Conference Proceeding
Language	English
Published	ACM 20.04.2024
Subjects	Benchmark testing Cognition Large language models Logic Pipelines Planning Program Induction Reinforcement learning Relational Reasoning Software engineering Surges
Online Access	Get full text
DOI	10.1145/3643795.3648387

Cover

Abstract	Large language models (LLMs) have revolutionized many areas (e.g. natural language processing, software engineering, etc.) by achieving state-of-the-art performance on extensive downstream tasks. Aiming to achieve robust and general artificial intelligence, there has been a surge of interest in investigating the reasoning ability of the LLMs. Whereas the textual and numerical reasoning benchmarks adopted by previous works are rather shallow and simple, it is hard to conclude that the LLMs possess strong reasoning ability by merely achieving positive results on these benchmarks. Recent efforts have demonstrated that the LLMs are poor at solving sequential decision-making problems that require common-sense planning by evaluating their performance on the reinforcement learning benchmarks. In this work, we conduct an in-depth assessment of several state-of-the-art LLMs' reasoning ability based on the inductive logic programming (ILP) benchmark, which is broadly recognized as a representative and challenging measurement for evaluating logic program induction/synthesis systems as it requires inducing strict cause-effect logic to achieve robust deduction on independent and identically distributed (IID) and out-of-distribution (OOD) test samples. Our evaluations illustrate that compared with the neural program induction systems which are much smaller in model size, the state-of-the-art LLMs are much poorer in terms of reasoning ability by achieving much lower performance and generalization using either natural language prompting or truth-value matrix prompting. 1
AbstractList	Large language models (LLMs) have revolutionized many areas (e.g. natural language processing, software engineering, etc.) by achieving state-of-the-art performance on extensive downstream tasks. Aiming to achieve robust and general artificial intelligence, there has been a surge of interest in investigating the reasoning ability of the LLMs. Whereas the textual and numerical reasoning benchmarks adopted by previous works are rather shallow and simple, it is hard to conclude that the LLMs possess strong reasoning ability by merely achieving positive results on these benchmarks. Recent efforts have demonstrated that the LLMs are poor at solving sequential decision-making problems that require common-sense planning by evaluating their performance on the reinforcement learning benchmarks. In this work, we conduct an in-depth assessment of several state-of-the-art LLMs' reasoning ability based on the inductive logic programming (ILP) benchmark, which is broadly recognized as a representative and challenging measurement for evaluating logic program induction/synthesis systems as it requires inducing strict cause-effect logic to achieve robust deduction on independent and identically distributed (IID) and out-of-distribution (OOD) test samples. Our evaluations illustrate that compared with the neural program induction systems which are much smaller in model size, the state-of-the-art LLMs are much poorer in terms of reasoning ability by achieving much lower performance and generalization using either natural language prompting or truth-value matrix prompting. 1
Author	Cao, Yushi Jiang, Junzhe Liu, Xu Teo, Yon Shin Xu, Xiufeng Lin, Shang-Wei Liu, Yang Li, Zhiming
Author_xml	– sequence: 1 givenname: Zhiming surname: Li fullname: Li, Zhiming email: zhiming001@e.ntu.edu.sg organization: Nanyang Technological University,Continental-NTU Corporate Lab,Singapore – sequence: 2 givenname: Yushi surname: Cao fullname: Cao, Yushi email: yushi002@e.ntu.edu.sg organization: Nanyang Technological University,Continental-NTU Corporate Lab,Singapore – sequence: 3 givenname: Xiufeng surname: Xu fullname: Xu, Xiufeng email: xiufeng001@e.ntu.edu.sg organization: Nanyang Technological University,Singapore – sequence: 4 givenname: Junzhe surname: Jiang fullname: Jiang, Junzhe email: junzhe.jiang@connect.polyu.hk organization: Hong Kong Polytechnic University,Hong Kong – sequence: 5 givenname: Xu surname: Liu fullname: Liu, Xu email: liuxu@comp.nus.edu.sg organization: National University of Singapore,Singapore – sequence: 6 givenname: Yon Shin surname: Teo fullname: Teo, Yon Shin email: yon.shin.teo@continentalcorporation.com organization: Continental Automotive Singapore Pte. Ltd.,Singapore – sequence: 7 givenname: Shang-Wei surname: Lin fullname: Lin, Shang-Wei email: shang-wei.lin@ntu.edu.sg organization: Nanyang Technological University,Continental-NTU Corporate Lab,Singapore – sequence: 8 givenname: Yang surname: Liu fullname: Liu, Yang email: yangliu@ntu.edu.sg organization: Nanyang Technological University,Continental-NTU Corporate Lab,Singapore
BookMark	eNotjsFKxDAQQCMoqGvPXjzkB1onnSTTeBFZXFeoCKJ4XCbdqRRqIu2C-PcW9PTe6fHO1XHKSZS6NFAZY901eosUXLWwwYaOVBEoNBaAwFHAU1XM8xAXr50HrM9U1bZPs-7zpF9k5MOQE4-L8pzTkD5u9DZ_6w1PmifR73J7oU56Hmcp_rlSb5v71_W2bJ8fHtd3bcnG46HEgD7u0Ydu37nIPQcSQwEcxi4ieR-pqyOAEfCxF2YQxkDNMtV0PRpcqau_7iAiu69p-OTpZ2eA0Hpj8Rdl80HT
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK ESBDL RIE RIL
DOI	10.1145/3643795.3648387
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Xplore Open Access Journals IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
EISBN	9798400705793
EndPage	126
ExternalDocumentID	10734614
Genre	orig-research
GrantInformation_xml	– fundername: National Research Foundation funderid: 10.13039/501100001321
GroupedDBID	6IE 6IL ACM ALMA_UNASSIGNED_HOLDINGS APO CBEJK ESBDL LHSKQ RIE RIL
ID	FETCH-LOGICAL-a163t-3936bd369cdc5bafa97e179053bcb3766b7c2b001e06bfeaa0ea39785608cf313
IEDL.DBID	RIE
IngestDate	Wed Aug 27 03:01:26 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a163t-3936bd369cdc5bafa97e179053bcb3766b7c2b001e06bfeaa0ea39785608cf313
OpenAccessLink	https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/document/10734614
PageCount	8
ParticipantIDs	ieee_primary_10734614
PublicationCentury	2000
PublicationDate	2024-April-20
PublicationDateYYYYMMDD	2024-04-20
PublicationDate_xml	– month: 04 year: 2024 text: 2024-April-20 day: 20
PublicationDecade	2020
PublicationTitle	2024 IEEE/ACM International Workshop on Large Language Models for Code (LLM4Code)
PublicationTitleAbbrev	LLM4CODE
PublicationYear	2024
Publisher	ACM
Publisher_xml	– name: ACM
SSID	ssib057256032
Score	1.9461334
Snippet	Large language models (LLMs) have revolutionized many areas (e.g. natural language processing, software engineering, etc.) by achieving state-of-the-art...
SourceID	ieee
SourceType	Publisher
StartPage	119
SubjectTerms	Benchmark testing Cognition Large language models Logic Pipelines Planning Program Induction Reinforcement learning Relational Reasoning Software engineering Surges
Title	LLMs for Relational Reasoning: How Far are We?
URI	https://ieeexplore.ieee.org/document/10734614
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LSwMxEA7akycVK77JwWvWNMkmGy8exFKkLR4s9lbymL0ordQtgr_eSdpVFARPCbnkNck3ycx8Q8hlYhDBM2SZxkcyU7a2DHG9YpxDKZwyPOTcgKOxHkzU_bScboLVcywMAGTnMyhSNdvy4yKs0lcZnnAjlU5pq7dRztbBWq3wlCaBtxQb-p6eKq9kMkrZssCyktXP_CkZPvq7ZNx2vPYaeS5WjS_Cxy9Oxn-PbI90vyP16MMXBu2TLZgfkGI4HL1R1EZp6-rmXrDq3vLX6zUdLN5p3y2pWwJ9gpsumfTvHm8HbJMXgTnUnhomrdQ-Sm1DDKV3tbMGMtGW9MHjhaG9CSJpQ8C1r8E5Dg7VjgrXpwq17MlD0pkv5nBEqBYW4clDNN6qUEVnnA6CRyOd0VHxY9JNk529rqkvZu08T_5oPyU7AlE_mVsEPyOdZrmCc0Ttxl_k3foEnHiT3A
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELVQGWACRBHfeGBNcGPHjlkYEFWAtGJoRbfKdi4LqEVpKiR-PWe3AYGExBQri7_9nn137wi59AoiuId0JPGSHAld6QhxPYsYgzQxQjEXcgMOhjIfi4dJOlkHq4dYGAAIzmcQ-2Kw5Zdzt_RPZbjDFRfSp63eROAX6Spcq10-qfLwzZO1gE9PpFfcm6V0GuM349nPDCoBQPo7ZNhWvfIbeYmXjY3dxy9Vxn-3bZd0v2P16NMXCu2RDZjtk7goBguKfJS2zm7mFYtmER5fr2k-f6d9U1NTA32Gmy4Z9-9Gt3m0zowQGeRPTcQ1l7bkUrvSpdZURisIUlvcOotHhrTKJZ4PAZO2AmMYGCQeGY5P5ire4wekM5vP4JBQmWgEKAulslq4rDTKSJewUnGjZCnYEen6zk7fVuIX07afx3_8vyBb-WhQTIv74eMJ2U6QA3jjS8JOSaepl3CGGN7Y8zBzn5Cdlyk
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2024+IEEE%2FACM+International+Workshop+on+Large+Language+Models+for+Code+%28LLM4Code%29&rft.atitle=LLMs+for+Relational+Reasoning%3A+How+Far+are+We%3F&rft.au=Li%2C+Zhiming&rft.au=Cao%2C+Yushi&rft.au=Xu%2C+Xiufeng&rft.au=Jiang%2C+Junzhe&rft.date=2024-04-20&rft.pub=ACM&rft.spage=119&rft.epage=126&rft_id=info:doi/10.1145%2F3643795.3648387&rft.externalDocID=10734614