GPT-4 enhanced multimodal grounding for autonomous driving: Leveraging cross-modal attention with large language models

In the field of autonomous vehicles (AVs), accurately discerning commander intent and executing linguistic commands within a visual context presents a significant challenge. This paper introduces a sophisticated encoder-decoder framework, developed to address visual grounding in AVs. Our Context-Awa...

Full description

Saved in:
Bibliographic Details
Published inCommunications in transportation research Vol. 4; p. 100116
Main Authors Liao, Haicheng, Shen, Huanming, Li, Zhenning, Wang, Chengyue, Li, Guofa, Bie, Yiming, Xu, Chengzhong
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.12.2024
Subjects
Online AccessGet full text

Cover

Loading…