VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality. To interface a highly sparse LiDAR point cloud with a region proposal network (RPN), most existing efforts have focused on ha...

Full description

Saved in:

Bibliographic Details
Published in	2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 4490 - 4499
Main Authors	Zhou, Yin, Tuzel, Oncel
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2018
Subjects	Encoding Feature extraction Laser radar Proposals Shape Three-dimensional displays
Online Access	Get full text

Cover

Loading…

Abstract	Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality. To interface a highly sparse LiDAR point cloud with a region proposal network (RPN), most existing efforts have focused on hand-crafted feature representations, for example, a bird's eye view projection. In this work, we remove the need of manual feature engineering for 3D point clouds and propose VoxelNet, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network. Specifically, VoxelNet divides a point cloud into equally spaced 3D voxels and transforms a group of points within each voxel into a unified feature representation through the newly introduced voxel feature encoding (VFE) layer. In this way, the point cloud is encoded as a descriptive volumetric representation, which is then connected to a RPN to generate detections. Experiments on the KITTI car detection benchmark show that VoxelNet outperforms the state-of-the-art LiDAR based 3D detection methods by a large margin. Furthermore, our network learns an effective discriminative representation of objects with various geometries, leading to encouraging results in 3D detection of pedestrians and cyclists, based on only LiDAR.
AbstractList	Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality. To interface a highly sparse LiDAR point cloud with a region proposal network (RPN), most existing efforts have focused on hand-crafted feature representations, for example, a bird's eye view projection. In this work, we remove the need of manual feature engineering for 3D point clouds and propose VoxelNet, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network. Specifically, VoxelNet divides a point cloud into equally spaced 3D voxels and transforms a group of points within each voxel into a unified feature representation through the newly introduced voxel feature encoding (VFE) layer. In this way, the point cloud is encoded as a descriptive volumetric representation, which is then connected to a RPN to generate detections. Experiments on the KITTI car detection benchmark show that VoxelNet outperforms the state-of-the-art LiDAR based 3D detection methods by a large margin. Furthermore, our network learns an effective discriminative representation of objects with various geometries, leading to encouraging results in 3D detection of pedestrians and cyclists, based on only LiDAR.
Author	Zhou, Yin Tuzel, Oncel
Author_xml	– sequence: 1 givenname: Yin surname: Zhou fullname: Zhou, Yin – sequence: 2 givenname: Oncel surname: Tuzel fullname: Tuzel, Oncel
BookMark	eNotz1FLwzAUBeAoCs7ZZx98yR9ovTdp0sQ37eYUihuiex1JeycdNZG2wvz3FhQOfHAeDpxLdhZiIMauETJEsLfldvOaCUCTAeSFOGGJLQwqabTOBdhTNkPQMtUW7QVLhuEAAEIbaXI1Y6ttPFL3QuMdX4YmHWM6wStyfWjDB9_Hnm9iG0ZedvG74Q9uoIbLBV_7A9UjX9A40cZwxc73rhso-XfO3h-Xb-VTWq1Xz-V9lbYixzGtvXdYeEceyBhnTa6lnBpXKLIOqbE5UI2iJl2jAqeMml6g8sLqwjkv5-zmb7clot1X3366_mdnVDEF5C9lukyW
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/CVPR.2018.00472
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences
EISBN	9781538664209 1538664208
EISSN	1063-6919
EndPage	4499
ExternalDocumentID	8578570
Genre	orig-research
GroupedDBID	6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO
ID	FETCH-LOGICAL-i241t-cbba17baeb0e88a984633a17a75e9a1ed940ec12ce6c150a58566415b2967aab3
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:52:16 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i241t-cbba17baeb0e88a984633a17a75e9a1ed940ec12ce6c150a58566415b2967aab3
PageCount	10
ParticipantIDs	ieee_primary_8578570
PublicationCentury	2000
PublicationDate	2018-06
PublicationDateYYYYMMDD	2018-06-01
PublicationDate_xml	– month: 06 year: 2018 text: 2018-06
PublicationDecade	2010
PublicationTitle	2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
PublicationTitleAbbrev	CVPR
PublicationYear	2018
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0002683845 ssj0003211698
Score	2.629564
Snippet	Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and...
SourceID	ieee
SourceType	Publisher
StartPage	4490
SubjectTerms	Encoding Feature extraction Laser radar Proposals Shape Three-dimensional displays
Title	VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
URI	https://ieeexplore.ieee.org/document/8578570
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8NAEF1qT56qtuI3e_Bo2nxudj2athahtYgtvZXdzVSKJRHdgPjrnU1iFfHgaZOFwDJLePNm3swQculGFTl2Ik9yJCh2GqBgylkh-El_FSFCWaI4nrDRLLxbRIsGudrWwgBAKT6Drn0sc_lprgsbKutx25klRoK-g8StqtXaxlN8xgNeZ8jse4DMhgled_PxXNFL5tMHq-Wy4snQdgT-MU6lRJNhi4y_zlGJSJ67hVFd_fGrReN_D7pHOt91e3S6RaR90oDsgLRqR5PWv_Fbm9zO83fYTMBc00GWOiZ3cKF1q9Unin4snebrzNBkkxcpvUGkS2nQp_fKRm1oH0wp4Mo6ZDYcPCYjp56o4KwRqY2jlZJerCQoFziXAp2PIMAdGUcgpAepCF3Qnq-BafQUJXIJxhDilS9YLKUKDkkzyzM4IhQgRs8Ev0aPLoyklAJNG2qBS2zTqcekbe2yfKmaZixrk5z8vX1Kdu3NVBqsM9I0rwWcI9obdVFe8yeCG6UU
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwFG8IHvSECsZve_DoYJ9d61FAUQGJAcKNtNvDEMlmtEuMf72v20RjPHjq1mRJ85rl93tfv0fIuR0UzrEVOJKjg2KmAQqmrAWCn3QXASKUcRQHQ9ab-HezYFYhF-teGADIi8-gaR7zXH6cRpkJlbW4UWYJ0UHfQNwPnKJbax1RcRn3eJkjM-8e-jZM8FLPx7FFqz0dPZpqLlM-6RtN4B8DVXI8ua6RwddJijKS52amVTP6-CXS-N-jbpPGd-ceHa0xaYdUINkltZJq0vJHfquTm2n6Dqsh6EvaTWJLpxYutBRbfaLIZOkoXSaatldpFtMrxLqYeh36oEzchnZA5yVcSYNMrrvjds8qZypYS8RqbUVKSSdUEpQNnEuB9MPzcEeGAQjpQCx8GyLHjYBFyBUlehOMIcgrV7BQSuXtkWqSJrBPKECI3AS_Rk7nB1JKgab1I4FLaBKqB6Ru7DJ_KWQz5qVJDv_ePiObvfGgP-_fDu-PyJa5paIi65hU9WsGJ4j9Wp3mV_4JUveoXQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2018+IEEE%2FCVF+Conference+on+Computer+Vision+and+Pattern+Recognition&rft.atitle=VoxelNet%3A+End-to-End+Learning+for+Point+Cloud+Based+3D+Object+Detection&rft.au=Zhou%2C+Yin&rft.au=Tuzel%2C+Oncel&rft.date=2018-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=4490&rft.epage=4499&rft_id=info:doi/10.1109%2FCVPR.2018.00472&rft.externalDocID=8578570