Rethinking Order: A Benchmark for Small Language Model Reasoning

Modern language models may rely on information given in a sequential style to make inferences. However, human cognition sometimes operates in a fragmented and associative fashion, recalling meaning from disordered or partially remembered things. In this work, we propose a mini evaluation benchmark t...

Full description

Saved in:

Bibliographic Details
Published in	2025 IEEE 2nd International Conference on Deep Learning and Computer Vision (DLCV) pp. 1 - 5
Main Authors	Yu, Chaojia, Zhang, Zezhou, Ouyang, Yuqi
Format	Conference Proceeding
Language	English
Published	IEEE 06.06.2025
Subjects	Accuracy Benchmark testing Benchmarking Cognition Cognitive-inspired Data models Natural languages Noise Question answering (information retrieval) Reasoning Robustness Semantics Sensitivity SLM Unordered
Online Access	Get full text
DOI	10.1109/DLCV65218.2025.11088638

Cover

More Information
Summary:	Modern language models may rely on information given in a sequential style to make inferences. However, human cognition sometimes operates in a fragmented and associative fashion, recalling meaning from disordered or partially remembered things. In this work, we propose a mini evaluation benchmark to study the ability of small-scale pre-trained language models to reason under unordered input conditions. We construct five representative reasoning scenarios, including natural language inference, sentiment classification, question answering, sentence similarity, and relation extraction. By varying between the inputs in natural order and the shuffled ones, the performance of several small language models across these tasks are evaluated using accuracy values. Also, the confidence score of the answers and the attention values applied on the input tokens are investigated. Experimental results show that the accuracy values, confidence scores and attention values produced by the shuffled inputs are similar to those by ordered inputs, indicating that the reasoning process of the small language models is influenced less by the input semantics, therefore highlighting the importance of future studies on reasoning mechanisms that enable language models to maintain focus on semantically pivotal elements over the positional indices of input data.
DOI:	10.1109/DLCV65218.2025.11088638