SRSU: An Online Road Map Detection and Network Estimation for Structured Bird's-Eye View Road Scene Understanding

Autonomous driving requires a structured understanding of the surrounding road maps and networks to navigate. However, considering the flexibility of autonomous vehicles and the variations in lane curvature and shape, the online and accurate extraction of road maps with fine-grained boundaries and r...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on intelligent vehicles pp. 1 - 13
Main Authors Jia, Peng, Jiang, Yahui, Ju, Zhiyang, Qi, Jianyong, Zang, Zheng, Wang, Yuchun, Gong, Jianwei
Format Journal Article
LanguageEnglish
Published IEEE 24.05.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Autonomous driving requires a structured understanding of the surrounding road maps and networks to navigate. However, considering the flexibility of autonomous vehicles and the variations in lane curvature and shape, the online and accurate extraction of road maps with fine-grained boundaries and road networks with lane topology in a unified framework remains challenging. This paper proposes SRSU, an online road map detection and network estimation framework for structured bird's-eye view road scene understanding. Specifically, we introduce a hierarchical map representation, i.e. , representing the road map as a set of ordered point sets with equivalent permutations and the road network as a directed graph, accurately describing the fine-grained map boundaries and lane topology in a unified framework. Building upon the above representation, we propose an online hierarchical map construction framework. It utilizes two sets of learnable hierarchical query embeddings to extract road maps with fine-grained boundaries and road networks with lane topologies, achieving a comprehensive understanding of the road scene. Furthermore, we introduce three empirical modules to enhance the accuracy of hierarchical map construction. These modules are termed auxiliary task prediction, multi-modal distillation, and higher-order interaction, responsible for enhancing the model's representational capabilities and providing valuable auxiliary information for subsequent tasks, generating robust features for final tasks, and learning the association information between different tasks, respectively. Finally, experiments on the nuScenes dataset demonstrate the proposed framework's effectiveness while highlighting the empirical module's superiority. Code will be available at https://github.com/jiapeng789/SRSU .
ISSN:2379-8858
2379-8904
DOI:10.1109/TIV.2024.3405561