GPT-4 enhanced multimodal grounding for autonomous driving: Leveraging cross-modal attention with large language models
In the field of autonomous vehicles (AVs), accurately discerning commander intent and executing linguistic commands within a visual context presents a significant challenge. This paper introduces a sophisticated encoder-decoder framework, developed to address visual grounding in AVs. Our Context-Awa...
Saved in:
Published in | Communications in transportation research Vol. 4; p. 100116 |
---|---|
Main Authors | , , , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.12.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Be the first to leave a comment!