Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks
As Large Language Models (LLMs) continue to evolve, the search for efficient and meaningful evaluation methods is ongoing. Many recent evaluations use LLMs as judges to score outputs from other LLMs, often relying on a single large model like GPT-4o. However, using a single LLM judge is prone to int...
Saved in:
Published in | arXiv.org |
---|---|
Main Authors | , , , |
Format | Paper |
Language | English |
Published |
Ithaca
Cornell University Library, arXiv.org
21.10.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Be the first to leave a comment!