VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection
Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality. To interface a highly sparse LiDAR point cloud with a region proposal network (RPN), most existing efforts have focused on ha...
Saved in:
Published in | 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition pp. 4490 - 4499 |
---|---|
Main Authors | , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.06.2018
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality. To interface a highly sparse LiDAR point cloud with a region proposal network (RPN), most existing efforts have focused on hand-crafted feature representations, for example, a bird's eye view projection. In this work, we remove the need of manual feature engineering for 3D point clouds and propose VoxelNet, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network. Specifically, VoxelNet divides a point cloud into equally spaced 3D voxels and transforms a group of points within each voxel into a unified feature representation through the newly introduced voxel feature encoding (VFE) layer. In this way, the point cloud is encoded as a descriptive volumetric representation, which is then connected to a RPN to generate detections. Experiments on the KITTI car detection benchmark show that VoxelNet outperforms the state-of-the-art LiDAR based 3D detection methods by a large margin. Furthermore, our network learns an effective discriminative representation of objects with various geometries, leading to encouraging results in 3D detection of pedestrians and cyclists, based on only LiDAR. |
---|---|
AbstractList | Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and augmented/virtual reality. To interface a highly sparse LiDAR point cloud with a region proposal network (RPN), most existing efforts have focused on hand-crafted feature representations, for example, a bird's eye view projection. In this work, we remove the need of manual feature engineering for 3D point clouds and propose VoxelNet, a generic 3D detection network that unifies feature extraction and bounding box prediction into a single stage, end-to-end trainable deep network. Specifically, VoxelNet divides a point cloud into equally spaced 3D voxels and transforms a group of points within each voxel into a unified feature representation through the newly introduced voxel feature encoding (VFE) layer. In this way, the point cloud is encoded as a descriptive volumetric representation, which is then connected to a RPN to generate detections. Experiments on the KITTI car detection benchmark show that VoxelNet outperforms the state-of-the-art LiDAR based 3D detection methods by a large margin. Furthermore, our network learns an effective discriminative representation of objects with various geometries, leading to encouraging results in 3D detection of pedestrians and cyclists, based on only LiDAR. |
Author | Zhou, Yin Tuzel, Oncel |
Author_xml | – sequence: 1 givenname: Yin surname: Zhou fullname: Zhou, Yin – sequence: 2 givenname: Oncel surname: Tuzel fullname: Tuzel, Oncel |
BookMark | eNotz1FLwzAUBeAoCs7ZZx98yR9ovTdp0sQ37eYUihuiex1JeycdNZG2wvz3FhQOfHAeDpxLdhZiIMauETJEsLfldvOaCUCTAeSFOGGJLQwqabTOBdhTNkPQMtUW7QVLhuEAAEIbaXI1Y6ttPFL3QuMdX4YmHWM6wStyfWjDB9_Hnm9iG0ZedvG74Q9uoIbLBV_7A9UjX9A40cZwxc73rhso-XfO3h-Xb-VTWq1Xz-V9lbYixzGtvXdYeEceyBhnTa6lnBpXKLIOqbE5UI2iJl2jAqeMml6g8sLqwjkv5-zmb7clot1X3366_mdnVDEF5C9lukyW |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/CVPR.2018.00472 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Applied Sciences |
EISBN | 9781538664209 1538664208 |
EISSN | 1063-6919 |
EndPage | 4499 |
ExternalDocumentID | 8578570 |
Genre | orig-research |
GroupedDBID | 6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
ID | FETCH-LOGICAL-i241t-cbba17baeb0e88a984633a17a75e9a1ed940ec12ce6c150a58566415b2967aab3 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 02:52:16 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i241t-cbba17baeb0e88a984633a17a75e9a1ed940ec12ce6c150a58566415b2967aab3 |
PageCount | 10 |
ParticipantIDs | ieee_primary_8578570 |
PublicationCentury | 2000 |
PublicationDate | 2018-06 |
PublicationDateYYYYMMDD | 2018-06-01 |
PublicationDate_xml | – month: 06 year: 2018 text: 2018-06 |
PublicationDecade | 2010 |
PublicationTitle | 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition |
PublicationTitleAbbrev | CVPR |
PublicationYear | 2018 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0002683845 ssj0003211698 |
Score | 2.629564 |
Snippet | Accurate detection of objects in 3D point clouds is a central problem in many applications, such as autonomous navigation, housekeeping robots, and... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 4490 |
SubjectTerms | Encoding Feature extraction Laser radar Proposals Shape Three-dimensional displays |
Title | VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection |
URI | https://ieeexplore.ieee.org/document/8578570 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NS8NAEF1qT56qtuI3e_Bo2nxudj2athahtYgtvZXdzVSKJRHdgPjrnU1iFfHgaZOFwDJLePNm3swQculGFTl2Ik9yJCh2GqBgylkh-El_FSFCWaI4nrDRLLxbRIsGudrWwgBAKT6Drn0sc_lprgsbKutx25klRoK-g8StqtXaxlN8xgNeZ8jse4DMhgled_PxXNFL5tMHq-Wy4snQdgT-MU6lRJNhi4y_zlGJSJ67hVFd_fGrReN_D7pHOt91e3S6RaR90oDsgLRqR5PWv_Fbm9zO83fYTMBc00GWOiZ3cKF1q9Unin4snebrzNBkkxcpvUGkS2nQp_fKRm1oH0wp4Mo6ZDYcPCYjp56o4KwRqY2jlZJerCQoFziXAp2PIMAdGUcgpAepCF3Qnq-BafQUJXIJxhDilS9YLKUKDkkzyzM4IhQgRs8Ev0aPLoyklAJNG2qBS2zTqcekbe2yfKmaZixrk5z8vX1Kdu3NVBqsM9I0rwWcI9obdVFe8yeCG6UU |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwFG8IHvSECsZve_DoYJ9d61FAUQGJAcKNtNvDEMlmtEuMf72v20RjPHjq1mRJ85rl93tfv0fIuR0UzrEVOJKjg2KmAQqmrAWCn3QXASKUcRQHQ9ab-HezYFYhF-teGADIi8-gaR7zXH6cRpkJlbW4UWYJ0UHfQNwPnKJbax1RcRn3eJkjM-8e-jZM8FLPx7FFqz0dPZpqLlM-6RtN4B8DVXI8ua6RwddJijKS52amVTP6-CXS-N-jbpPGd-ceHa0xaYdUINkltZJq0vJHfquTm2n6Dqsh6EvaTWJLpxYutBRbfaLIZOkoXSaatldpFtMrxLqYeh36oEzchnZA5yVcSYNMrrvjds8qZypYS8RqbUVKSSdUEpQNnEuB9MPzcEeGAQjpQCx8GyLHjYBFyBUlehOMIcgrV7BQSuXtkWqSJrBPKECI3AS_Rk7nB1JKgab1I4FLaBKqB6Ru7DJ_KWQz5qVJDv_ePiObvfGgP-_fDu-PyJa5paIi65hU9WsGJ4j9Wp3mV_4JUveoXQ |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2018+IEEE%2FCVF+Conference+on+Computer+Vision+and+Pattern+Recognition&rft.atitle=VoxelNet%3A+End-to-End+Learning+for+Point+Cloud+Based+3D+Object+Detection&rft.au=Zhou%2C+Yin&rft.au=Tuzel%2C+Oncel&rft.date=2018-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=4490&rft.epage=4499&rft_id=info:doi/10.1109%2FCVPR.2018.00472&rft.externalDocID=8578570 |