Pseudointelligence: A Unifying Framework for Language Model Evaluation

With large language models surpassing human performance on an increasing number of benchmarks, we must take a principled approach for targeted evaluation of model capabilities. Inspired by pseudorandomness, we propose pseudointelligence, which captures the maxim that "(perceived) intelligence l...

Full description

Saved in:

Bibliographic Details
Main Authors	Murty, Shikhar, Paradise, Orr, Sharma, Pratyusha
Format	Journal Article
Language	English
Published	18.10.2023
Subjects	Computer Science - Computation and Language
Online Access	Get full text

Cover

Loading…

More Information
Summary:	With large language models surpassing human performance on an increasing number of benchmarks, we must take a principled approach for targeted evaluation of model capabilities. Inspired by pseudorandomness, we propose pseudointelligence, which captures the maxim that "(perceived) intelligence lies in the eye of the beholder". That is, that claims of intelligence are meaningful only when their evaluator is taken into account. Concretely, we propose a complexity-theoretic framework of model evaluation cast as a dynamic interaction between a model and a learned evaluator. We demonstrate that this framework can be used to reason about two case studies in language model evaluation, as well as analyze existing evaluation methods.
DOI:	10.48550/arxiv.2310.12135