VILA: On Pre-training for Visual Language Models

Visual language models (VLMs) rapidly progressed with the recent success of large language models. There have been growing efforts on visual instruction tuning to extend the LLM with visual inputs, but lacks an in-depth study of the visual language pre-training process, where the model learns to per...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 26679 - 26689
Main Authors	Lin, Ji, Yin, Hongxu, Ping, Wei, Molchanov, Pavlo, Shoeybi, Mohammad, Han, Song
Format	Conference Proceeding
Language	English
Published	IEEE 16.06.2024
Subjects	Accuracy Benchmark testing Cognition Computer vision Degradation Large language models Visualization
Online Access	Get full text

Cover

Loading…

Be the first to leave a comment!