3DV: 3D Dynamic Voxel for Action Recognition in Depth Video

For depth-based 3D action recognition, one essential issue is to represent 3D motion pattern effectively and efficiently. To this end, 3D dynamic voxel (3DV) is proposed as a novel 3D motion representation manner. With 3D space voxelization, the key idea of 3DV is to encode the 3D motion information...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 508 - 517
Main Authors	Wang, Yancheng, Xiao, Yang, Xiong, Fu, Jiang, Wenxiang, Cao, Zhiguo, Zhou, Joey Tianyi, Yuan, Junsong
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2020
Subjects	Dynamics Machine learning Pattern recognition Skeleton Solid modeling Three-dimensional displays Two dimensional displays
Online Access	Get full text
ISSN	1063-6919
DOI	10.1109/CVPR42600.2020.00059

Cover

Loading…

Abstract	For depth-based 3D action recognition, one essential issue is to represent 3D motion pattern effectively and efficiently. To this end, 3D dynamic voxel (3DV) is proposed as a novel 3D motion representation manner. With 3D space voxelization, the key idea of 3DV is to encode the 3D motion information within depth video into a regular voxel set (i.e., 3DV) compactly, via temporal rank pooling. Each available 3DV voxel intrinsically involves 3D spatial and motion feature for 3D action description. 3DV is then abstracted as a point set and input into PointNet++ for 3D action recognition, in the end-to-end learning way. The intuition for transferring 3DV into the point set form is that, PointNet++ is lightweight and effective for deep feature learning towards point set. Since 3DV may loose appearance clue, a multi-stream 3D action recognition manner is also proposed to learn motion and appearance feature jointly. To extract richer temporal order information of actions, we also split the depth video into temporal segments and encode this procedure in 3DV integrally. The extensive experiments on the well-established benchmark datasets (e.g., NTU RGB+D 120 and NTU RGB+D 60) demonstrate the superiority of our proposition. Impressively, we acquire the accuracy of 82.4% and 93.5% on NTU RGB+D 120 with the cross-subject and cross-setup test setting respectively. 3DV's code is available at https://github.com/3huo/3DV-Action.
AbstractList	For depth-based 3D action recognition, one essential issue is to represent 3D motion pattern effectively and efficiently. To this end, 3D dynamic voxel (3DV) is proposed as a novel 3D motion representation manner. With 3D space voxelization, the key idea of 3DV is to encode the 3D motion information within depth video into a regular voxel set (i.e., 3DV) compactly, via temporal rank pooling. Each available 3DV voxel intrinsically involves 3D spatial and motion feature for 3D action description. 3DV is then abstracted as a point set and input into PointNet++ for 3D action recognition, in the end-to-end learning way. The intuition for transferring 3DV into the point set form is that, PointNet++ is lightweight and effective for deep feature learning towards point set. Since 3DV may loose appearance clue, a multi-stream 3D action recognition manner is also proposed to learn motion and appearance feature jointly. To extract richer temporal order information of actions, we also split the depth video into temporal segments and encode this procedure in 3DV integrally. The extensive experiments on the well-established benchmark datasets (e.g., NTU RGB+D 120 and NTU RGB+D 60) demonstrate the superiority of our proposition. Impressively, we acquire the accuracy of 82.4% and 93.5% on NTU RGB+D 120 with the cross-subject and cross-setup test setting respectively. 3DV's code is available at https://github.com/3huo/3DV-Action.
Author	Xiong, Fu Wang, Yancheng Jiang, Wenxiang Zhou, Joey Tianyi Xiao, Yang Yuan, Junsong Cao, Zhiguo
Author_xml	– sequence: 1 givenname: Yancheng surname: Wang fullname: Wang, Yancheng organization: National Key Laboratory of Science and Technology on Multispectral Information Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China – sequence: 2 givenname: Yang surname: Xiao fullname: Xiao, Yang organization: National Key Laboratory of Science and Technology on Multispectral Information Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China – sequence: 3 givenname: Fu surname: Xiong fullname: Xiong, Fu organization: Megvii Research Nanjing, Megvii Technology, China – sequence: 4 givenname: Wenxiang surname: Jiang fullname: Jiang, Wenxiang organization: National Key Laboratory of Science and Technology on Multispectral Information Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China – sequence: 5 givenname: Zhiguo surname: Cao fullname: Cao, Zhiguo organization: National Key Laboratory of Science and Technology on Multispectral Information Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China – sequence: 6 givenname: Joey Tianyi surname: Zhou fullname: Zhou, Joey Tianyi organization: IHPC, ASTAR, Singapore – sequence: 7 givenname: Junsong surname: Yuan fullname: Yuan, Junsong organization: CSE Department, State University of New York at Buffalo
BookMark	eNotjMtOwzAURA0CiVLyBbDwDyTca8cvWFUJL6kSqIJsK8dxwKi1qyQL-veUx2I0Z6SjOScnMUVPyBVCgQjmumpeViWTAAUDBgUACHNEMqM0KnYISi2OyQxB8lwaNGckG8fPg8YZojR6Rm553dxQXtN6H-02ONqkL7-hfRrowk0hRbryLr3H8Msh0trvpg_ahM6nC3La283os_-ek7f7u9fqMV8-PzxVi2UeGPApb1WpOGtbpXqjsZc_o9S6FeisR-OZdho1lFx22DHWG-cYuNLqzqJwbcfn5PLvN3jv17shbO2wXxsUShjBvwH8FkjK
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/CVPR42600.2020.00059
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences
EISBN	9781728171685 1728171687
EISSN	1063-6919
EndPage	517
ExternalDocumentID	9157595
Genre	orig-research
GroupedDBID	6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO
ID	FETCH-LOGICAL-i203t-b74732bb77f981f6732b488b51cae19e28c8180436d1d22f9cc20c4a8da15cbd3
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:30:35 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i203t-b74732bb77f981f6732b488b51cae19e28c8180436d1d22f9cc20c4a8da15cbd3
PageCount	10
ParticipantIDs	ieee_primary_9157595
PublicationCentury	2000
PublicationDate	2020-Jun
PublicationDateYYYYMMDD	2020-06-01
PublicationDate_xml	– month: 06 year: 2020 text: 2020-Jun
PublicationDecade	2020
PublicationTitle	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev	CVPR
PublicationYear	2020
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0003211698
Score	2.503407
Snippet	For depth-based 3D action recognition, one essential issue is to represent 3D motion pattern effectively and efficiently. To this end, 3D dynamic voxel (3DV)...
SourceID	ieee
SourceType	Publisher
StartPage	508
SubjectTerms	Dynamics Machine learning Pattern recognition Skeleton Solid modeling Three-dimensional displays Two dimensional displays
Title	3DV: 3D Dynamic Voxel for Action Recognition in Depth Video
URI	https://ieeexplore.ieee.org/document/9157595
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA_bTp6mbuI3OXi0W5M0_dCTrI4hTMZwZbfRvKQ4lHZoC-Jfb5J2U8SDt6SXhDzCey_9fSB05ZIQSCaZY2pbxxPcuAEGriM580BQAtTqdE8f_cnCe1jyZQtd77gwSikLPlMDM7T_8mUBlXkqG0bE2EnyNmrrxq3mau3eU5juZPwobNhxxI2Go2Q2t_rrugukBsBlBUl_eKjYFDLuoul28Ro58jKoSjGAz1-6jP_d3T7qf5P18GyXhg5QS-WHqNtUl7i5u-89dMvi5AazGMe1CT1Oig_1inXRiu8suQHPt2AiPV7nOFab8hkna6mKPlqM759GE6exTnDW1GWlI3SXwKgQQZBFIcl8M9FXVXACqSKRoiEYkrfHfEkkpVkEQF3w0lCmhIOQ7Ah18iJXxwhHYUp1CMH1IPAkuCmEuobyU87TTOpm7gT1zFmsNrU6xqo5htO_P5-hPRONGmx1jjrlW6UudFovxaWN5xcNIp9Q
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwFH9BPOgJFYzf9uDRwdqu-9CTYRJUIIQA4UbWj0Wi2YiOxPjX224DjfHgrd2lTV-a9173-wC4srEvcCypZWpby-HMuAF6tiUZdQQnWJBcp7s_cLsT53HGZhW43nBhlFI5-Ew1zTD_ly9TsTJPZa0AGztJtgXbOu8zXLC1Ni8qVPcybuCX_DhsB632dDjKFdh1H0gMhCuXJP3hopInkU4N-uvlC-zIS3OV8ab4_KXM-N_97UHjm66HhptEtA8VlRxArawvUXl73-twS8PpDaIhCgsbejRNP9Qr0mUrusvpDWi0hhPp8SJBoVpmz2i6kCptwKRzP253rdI8wVoQm2YW130CJZx7Xhz4OHbNRF9WzrCIFA4U8YWheTvUlVgSEgdCEFs4kS8jzASX9BCqSZqoI0CBHxEdRGE7wnOksCPh6yrKjRiLYqnbuWOom7OYLwt9jHl5DCd_f76Ene6435v3HgZPp7BrIlNAr86gmr2t1LlO8hm_yGP7BZGCopk
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=3DV%3A+3D+Dynamic+Voxel+for+Action+Recognition+in+Depth+Video&rft.au=Wang%2C+Yancheng&rft.au=Xiao%2C+Yang&rft.au=Xiong%2C+Fu&rft.au=Jiang%2C+Wenxiang&rft.date=2020-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=508&rft.epage=517&rft_id=info:doi/10.1109%2FCVPR42600.2020.00059&rft.externalDocID=9157595