3DV: 3D Dynamic Voxel for Action Recognition in Depth Video

For depth-based 3D action recognition, one essential issue is to represent 3D motion pattern effectively and efficiently. To this end, 3D dynamic voxel (3DV) is proposed as a novel 3D motion representation manner. With 3D space voxelization, the key idea of 3DV is to encode the 3D motion information...

Full description

Saved in:
Bibliographic Details
Published inProceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 508 - 517
Main Authors Wang, Yancheng, Xiao, Yang, Xiong, Fu, Jiang, Wenxiang, Cao, Zhiguo, Zhou, Joey Tianyi, Yuan, Junsong
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2020
Subjects
Online AccessGet full text
ISSN1063-6919
DOI10.1109/CVPR42600.2020.00059

Cover

Loading…
Abstract For depth-based 3D action recognition, one essential issue is to represent 3D motion pattern effectively and efficiently. To this end, 3D dynamic voxel (3DV) is proposed as a novel 3D motion representation manner. With 3D space voxelization, the key idea of 3DV is to encode the 3D motion information within depth video into a regular voxel set (i.e., 3DV) compactly, via temporal rank pooling. Each available 3DV voxel intrinsically involves 3D spatial and motion feature for 3D action description. 3DV is then abstracted as a point set and input into PointNet++ for 3D action recognition, in the end-to-end learning way. The intuition for transferring 3DV into the point set form is that, PointNet++ is lightweight and effective for deep feature learning towards point set. Since 3DV may loose appearance clue, a multi-stream 3D action recognition manner is also proposed to learn motion and appearance feature jointly. To extract richer temporal order information of actions, we also split the depth video into temporal segments and encode this procedure in 3DV integrally. The extensive experiments on the well-established benchmark datasets (e.g., NTU RGB+D 120 and NTU RGB+D 60) demonstrate the superiority of our proposition. Impressively, we acquire the accuracy of 82.4% and 93.5% on NTU RGB+D 120 with the cross-subject and cross-setup test setting respectively. 3DV's code is available at https://github.com/3huo/3DV-Action.
AbstractList For depth-based 3D action recognition, one essential issue is to represent 3D motion pattern effectively and efficiently. To this end, 3D dynamic voxel (3DV) is proposed as a novel 3D motion representation manner. With 3D space voxelization, the key idea of 3DV is to encode the 3D motion information within depth video into a regular voxel set (i.e., 3DV) compactly, via temporal rank pooling. Each available 3DV voxel intrinsically involves 3D spatial and motion feature for 3D action description. 3DV is then abstracted as a point set and input into PointNet++ for 3D action recognition, in the end-to-end learning way. The intuition for transferring 3DV into the point set form is that, PointNet++ is lightweight and effective for deep feature learning towards point set. Since 3DV may loose appearance clue, a multi-stream 3D action recognition manner is also proposed to learn motion and appearance feature jointly. To extract richer temporal order information of actions, we also split the depth video into temporal segments and encode this procedure in 3DV integrally. The extensive experiments on the well-established benchmark datasets (e.g., NTU RGB+D 120 and NTU RGB+D 60) demonstrate the superiority of our proposition. Impressively, we acquire the accuracy of 82.4% and 93.5% on NTU RGB+D 120 with the cross-subject and cross-setup test setting respectively. 3DV's code is available at https://github.com/3huo/3DV-Action.
Author Xiong, Fu
Wang, Yancheng
Jiang, Wenxiang
Zhou, Joey Tianyi
Xiao, Yang
Yuan, Junsong
Cao, Zhiguo
Author_xml – sequence: 1
  givenname: Yancheng
  surname: Wang
  fullname: Wang, Yancheng
  organization: National Key Laboratory of Science and Technology on Multispectral Information Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China
– sequence: 2
  givenname: Yang
  surname: Xiao
  fullname: Xiao, Yang
  organization: National Key Laboratory of Science and Technology on Multispectral Information Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China
– sequence: 3
  givenname: Fu
  surname: Xiong
  fullname: Xiong, Fu
  organization: Megvii Research Nanjing, Megvii Technology, China
– sequence: 4
  givenname: Wenxiang
  surname: Jiang
  fullname: Jiang, Wenxiang
  organization: National Key Laboratory of Science and Technology on Multispectral Information Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China
– sequence: 5
  givenname: Zhiguo
  surname: Cao
  fullname: Cao, Zhiguo
  organization: National Key Laboratory of Science and Technology on Multispectral Information Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China
– sequence: 6
  givenname: Joey Tianyi
  surname: Zhou
  fullname: Zhou, Joey Tianyi
  organization: IHPC, ASTAR, Singapore
– sequence: 7
  givenname: Junsong
  surname: Yuan
  fullname: Yuan, Junsong
  organization: CSE Department, State University of New York at Buffalo
BookMark eNotjMtOwzAURA0CiVLyBbDwDyTca8cvWFUJL6kSqIJsK8dxwKi1qyQL-veUx2I0Z6SjOScnMUVPyBVCgQjmumpeViWTAAUDBgUACHNEMqM0KnYISi2OyQxB8lwaNGckG8fPg8YZojR6Rm553dxQXtN6H-02ONqkL7-hfRrowk0hRbryLr3H8Msh0trvpg_ahM6nC3La283os_-ek7f7u9fqMV8-PzxVi2UeGPApb1WpOGtbpXqjsZc_o9S6FeisR-OZdho1lFx22DHWG-cYuNLqzqJwbcfn5PLvN3jv17shbO2wXxsUShjBvwH8FkjK
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR42600.2020.00059
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9781728171685
1728171687
EISSN 1063-6919
EndPage 517
ExternalDocumentID 9157595
Genre orig-research
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i203t-b74732bb77f981f6732b488b51cae19e28c8180436d1d22f9cc20c4a8da15cbd3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:30:35 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-b74732bb77f981f6732b488b51cae19e28c8180436d1d22f9cc20c4a8da15cbd3
PageCount 10
ParticipantIDs ieee_primary_9157595
PublicationCentury 2000
PublicationDate 2020-Jun
PublicationDateYYYYMMDD 2020-06-01
PublicationDate_xml – month: 06
  year: 2020
  text: 2020-Jun
PublicationDecade 2020
PublicationTitle Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev CVPR
PublicationYear 2020
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211698
Score 2.503407
Snippet For depth-based 3D action recognition, one essential issue is to represent 3D motion pattern effectively and efficiently. To this end, 3D dynamic voxel (3DV)...
SourceID ieee
SourceType Publisher
StartPage 508
SubjectTerms Dynamics
Machine learning
Pattern recognition
Skeleton
Solid modeling
Three-dimensional displays
Two dimensional displays
Title 3DV: 3D Dynamic Voxel for Action Recognition in Depth Video
URI https://ieeexplore.ieee.org/document/9157595
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA_bTp6mbuI3OXi0W5M0_dCTrI4hTMZwZbfRvKQ4lHZoC-Jfb5J2U8SDt6SXhDzCey_9fSB05ZIQSCaZY2pbxxPcuAEGriM580BQAtTqdE8f_cnCe1jyZQtd77gwSikLPlMDM7T_8mUBlXkqG0bE2EnyNmrrxq3mau3eU5juZPwobNhxxI2Go2Q2t_rrugukBsBlBUl_eKjYFDLuoul28Ro58jKoSjGAz1-6jP_d3T7qf5P18GyXhg5QS-WHqNtUl7i5u-89dMvi5AazGMe1CT1Oig_1inXRiu8suQHPt2AiPV7nOFab8hkna6mKPlqM759GE6exTnDW1GWlI3SXwKgQQZBFIcl8M9FXVXACqSKRoiEYkrfHfEkkpVkEQF3w0lCmhIOQ7Ah18iJXxwhHYUp1CMH1IPAkuCmEuobyU87TTOpm7gT1zFmsNrU6xqo5htO_P5-hPRONGmx1jjrlW6UudFovxaWN5xcNIp9Q
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwFH9BPOgJFYzf9uDRwdqu-9CTYRJUIIQA4UbWj0Wi2YiOxPjX224DjfHgrd2lTV-a9173-wC4srEvcCypZWpby-HMuAF6tiUZdQQnWJBcp7s_cLsT53HGZhW43nBhlFI5-Ew1zTD_ly9TsTJPZa0AGztJtgXbOu8zXLC1Ni8qVPcybuCX_DhsB632dDjKFdh1H0gMhCuXJP3hopInkU4N-uvlC-zIS3OV8ab4_KXM-N_97UHjm66HhptEtA8VlRxArawvUXl73-twS8PpDaIhCgsbejRNP9Qr0mUrusvpDWi0hhPp8SJBoVpmz2i6kCptwKRzP253rdI8wVoQm2YW130CJZx7Xhz4OHbNRF9WzrCIFA4U8YWheTvUlVgSEgdCEFs4kS8jzASX9BCqSZqoI0CBHxEdRGE7wnOksCPh6yrKjRiLYqnbuWOom7OYLwt9jHl5DCd_f76Ene6435v3HgZPp7BrIlNAr86gmr2t1LlO8hm_yGP7BZGCopk
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=3DV%3A+3D+Dynamic+Voxel+for+Action+Recognition+in+Depth+Video&rft.au=Wang%2C+Yancheng&rft.au=Xiao%2C+Yang&rft.au=Xiong%2C+Fu&rft.au=Jiang%2C+Wenxiang&rft.date=2020-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=508&rft.epage=517&rft_id=info:doi/10.1109%2FCVPR42600.2020.00059&rft.externalDocID=9157595