Rethinking Order: A Benchmark for Small Language Model Reasoning
Modern language models may rely on information given in a sequential style to make inferences. However, human cognition sometimes operates in a fragmented and associative fashion, recalling meaning from disordered or partially remembered things. In this work, we propose a mini evaluation benchmark t...
Saved in:
Published in | 2025 IEEE 2nd International Conference on Deep Learning and Computer Vision (DLCV) pp. 1 - 5 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
06.06.2025
|
Subjects | |
Online Access | Get full text |
DOI | 10.1109/DLCV65218.2025.11088638 |
Cover
Summary: | Modern language models may rely on information given in a sequential style to make inferences. However, human cognition sometimes operates in a fragmented and associative fashion, recalling meaning from disordered or partially remembered things. In this work, we propose a mini evaluation benchmark to study the ability of small-scale pre-trained language models to reason under unordered input conditions. We construct five representative reasoning scenarios, including natural language inference, sentiment classification, question answering, sentence similarity, and relation extraction. By varying between the inputs in natural order and the shuffled ones, the performance of several small language models across these tasks are evaluated using accuracy values. Also, the confidence score of the answers and the attention values applied on the input tokens are investigated. Experimental results show that the accuracy values, confidence scores and attention values produced by the shuffled inputs are similar to those by ordered inputs, indicating that the reasoning process of the small language models is influenced less by the input semantics, therefore highlighting the importance of future studies on reasoning mechanisms that enable language models to maintain focus on semantically pivotal elements over the positional indices of input data. |
---|---|
DOI: | 10.1109/DLCV65218.2025.11088638 |