Pseudointelligence: A Unifying Framework for Language Model Evaluation
With large language models surpassing human performance on an increasing number of benchmarks, we must take a principled approach for targeted evaluation of model capabilities. Inspired by pseudorandomness, we propose pseudointelligence, which captures the maxim that "(perceived) intelligence l...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
18.10.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | With large language models surpassing human performance on an increasing
number of benchmarks, we must take a principled approach for targeted
evaluation of model capabilities. Inspired by pseudorandomness, we propose
pseudointelligence, which captures the maxim that "(perceived) intelligence
lies in the eye of the beholder". That is, that claims of intelligence are
meaningful only when their evaluator is taken into account. Concretely, we
propose a complexity-theoretic framework of model evaluation cast as a dynamic
interaction between a model and a learned evaluator. We demonstrate that this
framework can be used to reason about two case studies in language model
evaluation, as well as analyze existing evaluation methods. |
---|---|
DOI: | 10.48550/arxiv.2310.12135 |