Instruction-aligned hierarchical waypoint planner for vision-and-language navigation in continuous environments

Developing agents to follow language instructions is a compelling yet challenging research topic. Recently, vision-and-language navigation in continuous environments has been proposed to explore the multi-modal pattern analysis and mapless navigation abilities of intelligent agents. However, current...

Full description

Saved in:
Bibliographic Details
Published inPattern analysis and applications : PAA Vol. 27; no. 4
Main Authors He, Zongtao, Wang, Naijia, Wang, Liuyi, Liu, Chengju, Chen, Qijun
Format Journal Article
LanguageEnglish
Published London Springer London 01.12.2024
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Developing agents to follow language instructions is a compelling yet challenging research topic. Recently, vision-and-language navigation in continuous environments has been proposed to explore the multi-modal pattern analysis and mapless navigation abilities of intelligent agents. However, current waypoint-based methods still have shortcomings, such as the coupled decision process and the possible shortest path-instruction misalignment. To address these challenges, we propose an instruction-aligned hierarchical waypoint planner (IA-HWP) that ensures fine-grained waypoint prediction and enhances instruction alignment. Our HWP architecture decouples waypoint planning into a coarse view selection phase and a refined waypoint location phase, effectively improving waypoint quality and enabling specialized training supervision for different phases. In terms of instruction-aligned model design, we introduce the global action-vision co-grounding and local text-vision co-grounding modules to explicitly improve the understanding of visual landmarks and actions, thereby enhancing the alignment between instructions and trajectories. In terms of instruction-aligned model optimization, we employ reference-waypoint-oriented supervision and direction-aware loss to optimize the model for enhanced instruction following and waypoint execution capabilities. Experiments on the standard benchmark demonstrate the effectiveness of our approach, with improved success rate compared to existing methods.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1433-7541
1433-755X
DOI:10.1007/s10044-024-01339-z