GPT-4 enhanced multimodal grounding for autonomous driving: Leveraging cross-modal attention with large language models

In the field of autonomous vehicles (AVs), accurately discerning commander intent and executing linguistic commands within a visual context presents a significant challenge. This paper introduces a sophisticated encoder-decoder framework, developed to address visual grounding in AVs. Our Context-Awa...

Full description

Saved in:

Bibliographic Details
Published in	Communications in transportation research Vol. 4; p. 100116
Main Authors	Liao, Haicheng, Shen, Huanming, Li, Zhenning, Wang, Chengyue, Li, Guofa, Bie, Yiming, Xu, Chengzhong
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.12.2024
Subjects	Autonomous driving Cross-modal attention Human-machine interaction Large language models Visual grounding Visual grounding Human-machine interaction Autonomous driving Cross-modal attention Large language models
Online Access	Get full text

Cover

Loading…

Be the first to leave a comment!