Vision Language Models in Autonomous Driving: A Survey and Outlook

The applications of Vision-Language Models (VLMs) in the field of Autonomous Driving (AD) have attracted widespread attention due to their outstanding performance and the ability to leverage Large Language Models (LLMs). By incorporating language data, driving systems can gain a better understanding...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Zhou, Xingcheng, Liu, Mingyu, Yurtsever, Ekim, Bare, Luka Zagar, Zimmer, Walter, Cao, Hu, Knoll, Alois C
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 20.06.2024
Subjects	Intelligent transportation systems Large language models Vehicle safety
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The applications of Vision-Language Models (VLMs) in the field of Autonomous Driving (AD) have attracted widespread attention due to their outstanding performance and the ability to leverage Large Language Models (LLMs). By incorporating language data, driving systems can gain a better understanding of real-world environments, thereby enhancing driving safety and efficiency. In this work, we present a comprehensive and systematic survey of the advances in vision language models in this domain, encompassing perception and understanding, navigation and planning, decision-making and control, end-to-end autonomous driving, and data generation. We introduce the mainstream VLM tasks in AD and the commonly utilized metrics. Additionally, we review current studies and applications in various areas and summarize the existing language-enhanced autonomous driving datasets thoroughly. Lastly, we discuss the benefits and challenges of VLMs in AD and provide researchers with the current research gaps and future trends.
ISSN:	2331-8422