VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection

Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality. To interface a highly sparse LiDAR point cloud with a region proposal network (RPN), most existing efforts have focused on ha...

Full description

Saved in:
Bibliographic Details
Published in2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 4490 - 4499
Main Authors Zhou, Yin, Tuzel, Oncel
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2018
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality. To interface a highly sparse LiDAR point cloud with a region proposal network (RPN), most existing efforts have focused on hand-crafted feature representations, for example, a bird's eye view projection. In this work, we remove the need of manual feature engineering for 3D point clouds and propose VoxelNet, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network. Specifically, VoxelNet divides a point cloud into equally spaced 3D voxels and transforms a group of points within each voxel into a unified feature representation through the newly introduced voxel feature encoding (VFE) layer. In this way, the point cloud is encoded as a descriptive volumetric representation, which is then connected to a RPN to generate detections. Experiments on the KITTI car detection benchmark show that VoxelNet outperforms the state-of-the-art LiDAR based 3D detection methods by a large margin. Furthermore, our network learns an effective discriminative representation of objects with various geometries, leading to encouraging results in 3D detection of pedestrians and cyclists, based on only LiDAR.
AbstractList Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality. To interface a highly sparse LiDAR point cloud with a region proposal network (RPN), most existing efforts have focused on hand-crafted feature representations, for example, a bird's eye view projection. In this work, we remove the need of manual feature engineering for 3D point clouds and propose VoxelNet, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network. Specifically, VoxelNet divides a point cloud into equally spaced 3D voxels and transforms a group of points within each voxel into a unified feature representation through the newly introduced voxel feature encoding (VFE) layer. In this way, the point cloud is encoded as a descriptive volumetric representation, which is then connected to a RPN to generate detections. Experiments on the KITTI car detection benchmark show that VoxelNet outperforms the state-of-the-art LiDAR based 3D detection methods by a large margin. Furthermore, our network learns an effective discriminative representation of objects with various geometries, leading to encouraging results in 3D detection of pedestrians and cyclists, based on only LiDAR.
Author Zhou, Yin
Tuzel, Oncel
Author_xml – sequence: 1
  givenname: Yin
  surname: Zhou
  fullname: Zhou, Yin
– sequence: 2
  givenname: Oncel
  surname: Tuzel
  fullname: Tuzel, Oncel
BookMark eNotz1FLwzAUBeAoCs7ZZx98yR9ovTdp0sQ37eYUihuiex1JeycdNZG2wvz3FhQOfHAeDpxLdhZiIMauETJEsLfldvOaCUCTAeSFOGGJLQwqabTOBdhTNkPQMtUW7QVLhuEAAEIbaXI1Y6ttPFL3QuMdX4YmHWM6wStyfWjDB9_Hnm9iG0ZedvG74Q9uoIbLBV_7A9UjX9A40cZwxc73rhso-XfO3h-Xb-VTWq1Xz-V9lbYixzGtvXdYeEceyBhnTa6lnBpXKLIOqbE5UI2iJl2jAqeMml6g8sLqwjkv5-zmb7clot1X3366_mdnVDEF5C9lukyW
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR.2018.00472
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9781538664209
1538664208
EISSN 1063-6919
EndPage 4499
ExternalDocumentID 8578570
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i241t-cbba17baeb0e88a984633a17a75e9a1ed940ec12ce6c150a58566415b2967aab3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:52:16 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i241t-cbba17baeb0e88a984633a17a75e9a1ed940ec12ce6c150a58566415b2967aab3
PageCount 10
ParticipantIDs ieee_primary_8578570
PublicationCentury 2000
PublicationDate 2018-06
PublicationDateYYYYMMDD 2018-06-01
PublicationDate_xml – month: 06
  year: 2018
  text: 2018-06
PublicationDecade 2010
PublicationTitle 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition
PublicationTitleAbbrev CVPR
PublicationYear 2018
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0002683845
ssj0003211698
Score 2.629564
Snippet Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and...
SourceID ieee
SourceType Publisher
StartPage 4490
SubjectTerms Encoding
Feature extraction
Laser radar
Proposals
Shape
Three-dimensional displays
Title VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
URI https://ieeexplore.ieee.org/document/8578570
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8NAEF1qT56qtuI3e_Bo2nxudj2athahtYgtvZXdzVSKJRHdgPjrnU1iFfHgaZOFwDJLePNm3swQculGFTl2Ik9yJCh2GqBgylkh-El_FSFCWaI4nrDRLLxbRIsGudrWwgBAKT6Drn0sc_lprgsbKutx25klRoK-g8StqtXaxlN8xgNeZ8jse4DMhgled_PxXNFL5tMHq-Wy4snQdgT-MU6lRJNhi4y_zlGJSJ67hVFd_fGrReN_D7pHOt91e3S6RaR90oDsgLRqR5PWv_Fbm9zO83fYTMBc00GWOiZ3cKF1q9Unin4snebrzNBkkxcpvUGkS2nQp_fKRm1oH0wp4Mo6ZDYcPCYjp56o4KwRqY2jlZJerCQoFziXAp2PIMAdGUcgpAepCF3Qnq-BafQUJXIJxhDilS9YLKUKDkkzyzM4IhQgRs8Ev0aPLoyklAJNG2qBS2zTqcekbe2yfKmaZixrk5z8vX1Kdu3NVBqsM9I0rwWcI9obdVFe8yeCG6UU
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwFG8IHvSECsZve_DoYJ9d61FAUQGJAcKNtNvDEMlmtEuMf72v20RjPHjq1mRJ85rl93tfv0fIuR0UzrEVOJKjg2KmAQqmrAWCn3QXASKUcRQHQ9ab-HezYFYhF-teGADIi8-gaR7zXH6cRpkJlbW4UWYJ0UHfQNwPnKJbax1RcRn3eJkjM-8e-jZM8FLPx7FFqz0dPZpqLlM-6RtN4B8DVXI8ua6RwddJijKS52amVTP6-CXS-N-jbpPGd-ceHa0xaYdUINkltZJq0vJHfquTm2n6Dqsh6EvaTWJLpxYutBRbfaLIZOkoXSaatldpFtMrxLqYeh36oEzchnZA5yVcSYNMrrvjds8qZypYS8RqbUVKSSdUEpQNnEuB9MPzcEeGAQjpQCx8GyLHjYBFyBUlehOMIcgrV7BQSuXtkWqSJrBPKECI3AS_Rk7nB1JKgab1I4FLaBKqB6Ru7DJ_KWQz5qVJDv_ePiObvfGgP-_fDu-PyJa5paIi65hU9WsGJ4j9Wp3mV_4JUveoXQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2018+IEEE%2FCVF+Conference+on+Computer+Vision+and+Pattern+Recognition&rft.atitle=VoxelNet%3A+End-to-End+Learning+for+Point+Cloud+Based+3D+Object+Detection&rft.au=Zhou%2C+Yin&rft.au=Tuzel%2C+Oncel&rft.date=2018-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=4490&rft.epage=4499&rft_id=info:doi/10.1109%2FCVPR.2018.00472&rft.externalDocID=8578570