DriveGPT4: Interpretable End-to-End Autonomous Driving Via Large Language Model

Multimodallarge language models (MLLMs) have emerged as a prominent area of interest within the research community, given their proficiency in handling and reasoning with non-textual data, including images and videos. This study seeks to extend the application of MLLMs to the realm of autonomous dri...

Full description

Saved in:

Bibliographic Details
Published in	IEEE robotics and automation letters Vol. 9; no. 10; pp. 8186 - 8193
Main Authors	Xu, Zhenhua, Zhang, Yujia, Xie, Enze, Zhao, Zhen, Guo, Yong, Wong, Kwan-Yee K., Li, Zhenguo, Zhao, Hengshuang
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.10.2024 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Autonomous driving Autonomous vehicles Chatbots Cognition Datasets large language model Large language models Qualitative analysis Query processing Reasoning Tuning Turning Videos Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Multimodallarge language models (MLLMs) have emerged as a prominent area of interest within the research community, given their proficiency in handling and reasoning with non-textual data, including images and videos. This study seeks to extend the application of MLLMs to the realm of autonomous driving by introducing DriveGPT4, a novel interpretable end-to-end autonomous driving system based on LLMs. Capable of processing multi-frame video inputs and textual queries, DriveGPT4 facilitates the interpretation of vehicle actions, offers pertinent reasoning, and effectively addresses a diverse range of questions posed by users. Furthermore, DriveGPT4 predicts low-level vehicle control signals in an end-to-end fashion. These advanced capabilities are achieved through the utilization of a bespoke visual instruction tuning dataset, specifically tailored for autonomous driving applications, in conjunction with a mix-finetuning training strategy. DriveGPT4 represents the pioneering effort to leverage LLMs for the development of an interpretable end-to-end autonomous driving solution. Evaluations conducted on the BDD-X dataset showcase the superior qualitative and quantitative performance of DriveGPT4. Additionally, the fine-tuning of domain-specific data enables DriveGPT4 to yield close or even improved results in terms of autonomous driving grounding when contrasted with GPT4-V.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2024.3440097