VBench: Comprehensive Benchmark Suite for Video Generative Models
Video generation has witnessed significant advance-ments, yet evaluating these models remains a challenge. A comprehensive evaluation benchmark for video generation is indispensable for two reasons: 1) Existing metrics do not fully align with human perceptions; 2) An ideal eval-uation system should...
Saved in:
Published in | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 21807 - 21818 |
---|---|
Main Authors | , , , , , , , , , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
16.06.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Video generation has witnessed significant advance-ments, yet evaluating these models remains a challenge. A comprehensive evaluation benchmark for video generation is indispensable for two reasons: 1) Existing metrics do not fully align with human perceptions; 2) An ideal eval-uation system should provide insights to inform future de-velopments of video generation. To this end, we present VBench, a comprehensive benchmark suite that dissects "video generation quality" into specific, hierarchical, and disentangled dimensions, each with tailored prompts and evaluation methods. VBench has three appealing proper-ties: 1) Comprehensive Dimensions: VBench comprises 16 dimensions in video generation (e.g., subject identity in-consistency, motion smoothness, temporal flickering, and spatial relationship, etc.). The evaluation metrics with fine-grained levels reveal individual models' strengths and weaknesses. 2) Human Alignment: We also provide a dataset of human preference annotations to validate our benchmarks' alignment with human perception, for each evaluation dimension respectively. 3) Valuable Insights: We look into current models' ability across various evaluation dimensions, and various content types. We also investi-gate the gaps between video and image generation models. We will open-source VBench, including all prompts, evaluation methods, generated videos, and human preference an-notations, and also include more video generation models in VBench to drive forward the field of video generation. |
---|---|
ISSN: | 1063-6919 |
DOI: | 10.1109/CVPR52733.2024.02060 |