BEVFormer: Learning Bird's-Eye-View Representation from Multi-camera Images via Spatiotemporal Transformers

3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple...

Full description

Saved in:

Bibliographic Details
Published in	Computer Vision - ECCV 2022 Vol. 13669; pp. 1 - 18
Main Authors	Li, Zhiqi, Wang, Wenhai, Li, Hongyang, Xie, Enze, Sima, Chonghao, Lu, Tong, Qiao, Yu, Dai, Jifeng
Format	Book Chapter
Language	English
Published	Switzerland Springer 2022 Springer Nature Switzerland
Series	Lecture Notes in Computer Science
Subjects	3D object detection Autonomous driving Bird’s-Eye-View Map segmentation Transformer
Online Access	Get full text

Cover

Loading…

More Information
Summary:	3D visual perception tasks, including 3D detection and map segmentation based on multi-camera images, are essential for autonomous driving systems. In this work, we present a new framework termed BEVFormer, which learns unified BEV representations with spatiotemporal transformers to support multiple autonomous driving perception tasks. In a nutshell, BEVFormer exploits both spatial and temporal information by interacting with spatial and temporal space through predefined grid-shaped BEV queries. To aggregate spatial information, we design spatial cross-attention that each BEV query extracts the spatial features from the regions of interest across camera views. For temporal information, we propose temporal self-attention to recurrently fuse the history BEV information. Our approach achieves the new state-of-the-art 56.9% in terms of NDS metric on the nuScenes test set, which is 9.0 points higher than previous best arts and on par with the performance of LiDAR-based baselines. The code is available at https://github.com/zhiqi-li/BEVFormer.
Bibliography:	Supplementary InformationThe online version contains supplementary material available at https://doi.org/10.1007/978-3-031-20077-9_1. Z. Li, W. Wang and H. Li—Equal contribution.
ISBN:	9783031200762 3031200764
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-031-20077-9_1