From Pixels to Tokens: Revisiting Object Hallucinations in Large Vision-Language Models

Hallucinations in large vision-language models (LVLMs) are a significant challenge, i.e., generating objects that are not presented in the visual input, which impairs their reliability. Recent studies often attribute hallucinations to a lack of understanding of visual input, yet ignore a more fundam...

Full description

Saved in:

Bibliographic Details
Main Authors	Shang, Yuying, Zeng, Xinyi, Zhu, Yutao, Yang, Xiao, Fang, Zhengwei, Zhang, Jingyuan, Chen, Jiawei, Liu, Zinan, Tian, Yu
Format	Journal Article
Language	English
Published	09.10.2024
Subjects	Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition
Online Access	Get full text

Cover

Loading…

Be the first to leave a comment!