Language Model Council: Democratically Benchmarking Foundation Models on Highly Subjective Tasks

As Large Language Models (LLMs) continue to evolve, the search for efficient and meaningful evaluation methods is ongoing. Many recent evaluations use LLMs as judges to score outputs from other LLMs, often relying on a single large model like GPT-4o. However, using a single LLM judge is prone to int...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Zhao, Justin, Plaza-del-Arco, Flor Miriam, Genchel, Benjie, Curry, Amanda Cercas
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 21.10.2024
Subjects
Online AccessGet full text

Cover

Loading…