Multi-Model Consistency for LLMs' Evaluation
This paper introduces an evaluation method for large language models (LLMs) based on multi-model factual cognition consistency. Traditional evaluation methods, especially in terms of factuality assessments, face challenges in constructing extensive domain-specific question sets and relying on specif...
Saved in:
Published in | 2024 International Joint Conference on Neural Networks (IJCNN) pp. 1 - 8 |
---|---|
Main Authors | , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
30.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This paper introduces an evaluation method for large language models (LLMs) based on multi-model factual cognition consistency. Traditional evaluation methods, especially in terms of factuality assessments, face challenges in constructing extensive domain-specific question sets and relying on specific model answers. These methods fall short in the face of dynamic and diverse model development. To overcome these limitations, the proposed approach does not depend on a fixed set of standard answers. Instead, it utilizes the responses of multiple models to construct a dynamic, relative evaluation benchmark. We first developed a framework to capture and compare the cognitive consistency of different models when addressing specific questions. Subsequently, a dynamic iterative algorithm was designed to evaluate models based on these sets of answers. Experiments across multiple domains demonstrated the effectiveness of this method. This innovative evaluation strategy not only provides a more comprehensive and flexible approach to understanding and assessing the performance of LLMs in various scenarios but also offers practical guidance for future model development and improvement. |
---|---|
ISSN: | 2161-4407 |
DOI: | 10.1109/IJCNN60899.2024.10651158 |