THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
Mitigating hallucinations in large vision-language models (LVLMs) remains an open problem. Recent benchmarks do not address hallucinations in open-ended free-form responses, which we term "Type I hallucinations". Instead, they focus on hallucinations responding to very specific question fo...
Saved in:
Main Authors | , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
08.05.2024
|
Subjects | |
Online Access | Get full text |
DOI | 10.48550/arxiv.2405.05256 |
Cover
Loading…
Summary: | Mitigating hallucinations in large vision-language models (LVLMs) remains an
open problem. Recent benchmarks do not address hallucinations in open-ended
free-form responses, which we term "Type I hallucinations". Instead, they focus
on hallucinations responding to very specific question formats -- typically a
multiple-choice response regarding a particular object or attribute -- which we
term "Type II hallucinations". Additionally, such benchmarks often require
external API calls to models which are subject to change. In practice, we
observe that a reduction in Type II hallucinations does not lead to a reduction
in Type I hallucinations but rather that the two forms of hallucinations are
often anti-correlated. To address this, we propose THRONE, a novel object-based
automatic framework for quantitatively evaluating Type I hallucinations in LVLM
free-form outputs. We use public language models (LMs) to identify
hallucinations in LVLM responses and compute informative metrics. By evaluating
a large selection of recent LVLMs using public datasets, we show that an
improvement in existing metrics do not lead to a reduction in Type I
hallucinations, and that established benchmarks for measuring Type I
hallucinations are incomplete. Finally, we provide a simple and effective data
augmentation method to reduce Type I and Type II hallucinations as a strong
baseline. Code is now available at https://github.com/amazon-science/THRONE . |
---|---|
DOI: | 10.48550/arxiv.2405.05256 |