BEVFormer: Learning Bird's-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers
3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple...
Saved in:
Published in | Computer Vision - ECCV 2022 Vol. 13669; pp. 1 - 18 |
---|---|
Main Authors | , , , , , , , |
Format | Book Chapter |
Language | English |
Published |
Switzerland
Springer
2022
Springer Nature Switzerland |
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | 3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, we design spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, we propose temporal self-attention to recurrently fuse the history BEV information. Our approach achieves the new state-of-the-art 56.9% in terms of NDS metric on the nuScenes test set, which is 9.0 points higher than previous best arts and on par with the performance of LiDAR-based baselines. The code is available at https://github.com/zhiqi-li/BEVFormer. |
---|---|
Bibliography: | Supplementary InformationThe online version contains supplementary material available at https://doi.org/10.1007/978-3-031-20077-9_1. Z. Li, W. Wang and H. Li—Equal contribution. |
ISBN: | 9783031200762 3031200764 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/978-3-031-20077-9_1 |