3DV: 3D Dynamic Voxel for Action Recognition in Depth Video
For depth-based 3D action recognition, one essential issue is to represent 3D motion pattern effectively and efficiently. To this end, 3D dynamic voxel (3DV) is proposed as a novel 3D motion representation manner. With 3D space voxelization, the key idea of 3DV is to encode the 3D motion information...
Saved in:
Published in | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 508 - 517 |
---|---|
Main Authors | , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.06.2020
|
Subjects | |
Online Access | Get full text |
ISSN | 1063-6919 |
DOI | 10.1109/CVPR42600.2020.00059 |
Cover
Loading…
Abstract | For depth-based 3D action recognition, one essential issue is to represent 3D motion pattern effectively and efficiently. To this end, 3D dynamic voxel (3DV) is proposed as a novel 3D motion representation manner. With 3D space voxelization, the key idea of 3DV is to encode the 3D motion information within depth video into a regular voxel set (i.e., 3DV) compactly, via temporal rank pooling. Each available 3DV voxel intrinsically involves 3D spatial and motion feature for 3D action description. 3DV is then abstracted as a point set and input into PointNet++ for 3D action recognition, in the end-to-end learning way. The intuition for transferring 3DV into the point set form is that, PointNet++ is lightweight and effective for deep feature learning towards point set. Since 3DV may loose appearance clue, a multi-stream 3D action recognition manner is also proposed to learn motion and appearance feature jointly. To extract richer temporal order information of actions, we also split the depth video into temporal segments and encode this procedure in 3DV integrally. The extensive experiments on the well-established benchmark datasets (e.g., NTU RGB+D 120 and NTU RGB+D 60) demonstrate the superiority of our proposition. Impressively, we acquire the accuracy of 82.4% and 93.5% on NTU RGB+D 120 with the cross-subject and cross-setup test setting respectively. 3DV's code is available at https://github.com/3huo/3DV-Action. |
---|---|
AbstractList | For depth-based 3D action recognition, one essential issue is to represent 3D motion pattern effectively and efficiently. To this end, 3D dynamic voxel (3DV) is proposed as a novel 3D motion representation manner. With 3D space voxelization, the key idea of 3DV is to encode the 3D motion information within depth video into a regular voxel set (i.e., 3DV) compactly, via temporal rank pooling. Each available 3DV voxel intrinsically involves 3D spatial and motion feature for 3D action description. 3DV is then abstracted as a point set and input into PointNet++ for 3D action recognition, in the end-to-end learning way. The intuition for transferring 3DV into the point set form is that, PointNet++ is lightweight and effective for deep feature learning towards point set. Since 3DV may loose appearance clue, a multi-stream 3D action recognition manner is also proposed to learn motion and appearance feature jointly. To extract richer temporal order information of actions, we also split the depth video into temporal segments and encode this procedure in 3DV integrally. The extensive experiments on the well-established benchmark datasets (e.g., NTU RGB+D 120 and NTU RGB+D 60) demonstrate the superiority of our proposition. Impressively, we acquire the accuracy of 82.4% and 93.5% on NTU RGB+D 120 with the cross-subject and cross-setup test setting respectively. 3DV's code is available at https://github.com/3huo/3DV-Action. |
Author | Xiong, Fu Wang, Yancheng Jiang, Wenxiang Zhou, Joey Tianyi Xiao, Yang Yuan, Junsong Cao, Zhiguo |
Author_xml | – sequence: 1 givenname: Yancheng surname: Wang fullname: Wang, Yancheng organization: National Key Laboratory of Science and Technology on Multispectral Information Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China – sequence: 2 givenname: Yang surname: Xiao fullname: Xiao, Yang organization: National Key Laboratory of Science and Technology on Multispectral Information Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China – sequence: 3 givenname: Fu surname: Xiong fullname: Xiong, Fu organization: Megvii Research Nanjing, Megvii Technology, China – sequence: 4 givenname: Wenxiang surname: Jiang fullname: Jiang, Wenxiang organization: National Key Laboratory of Science and Technology on Multispectral Information Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China – sequence: 5 givenname: Zhiguo surname: Cao fullname: Cao, Zhiguo organization: National Key Laboratory of Science and Technology on Multispectral Information Processing, School of Artificial Intelligence and Automation, Huazhong University of Science and Technology, China – sequence: 6 givenname: Joey Tianyi surname: Zhou fullname: Zhou, Joey Tianyi organization: IHPC, ASTAR, Singapore – sequence: 7 givenname: Junsong surname: Yuan fullname: Yuan, Junsong organization: CSE Department, State University of New York at Buffalo |
BookMark | eNotjMtOwzAURA0CiVLyBbDwDyTca8cvWFUJL6kSqIJsK8dxwKi1qyQL-veUx2I0Z6SjOScnMUVPyBVCgQjmumpeViWTAAUDBgUACHNEMqM0KnYISi2OyQxB8lwaNGckG8fPg8YZojR6Rm553dxQXtN6H-02ONqkL7-hfRrowk0hRbryLr3H8Msh0trvpg_ahM6nC3La283os_-ek7f7u9fqMV8-PzxVi2UeGPApb1WpOGtbpXqjsZc_o9S6FeisR-OZdho1lFx22DHWG-cYuNLqzqJwbcfn5PLvN3jv17shbO2wXxsUShjBvwH8FkjK |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/CVPR42600.2020.00059 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Applied Sciences |
EISBN | 9781728171685 1728171687 |
EISSN | 1063-6919 |
EndPage | 517 |
ExternalDocumentID | 9157595 |
Genre | orig-research |
GroupedDBID | 6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
ID | FETCH-LOGICAL-i203t-b74732bb77f981f6732b488b51cae19e28c8180436d1d22f9cc20c4a8da15cbd3 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 02:30:35 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i203t-b74732bb77f981f6732b488b51cae19e28c8180436d1d22f9cc20c4a8da15cbd3 |
PageCount | 10 |
ParticipantIDs | ieee_primary_9157595 |
PublicationCentury | 2000 |
PublicationDate | 2020-Jun |
PublicationDateYYYYMMDD | 2020-06-01 |
PublicationDate_xml | – month: 06 year: 2020 text: 2020-Jun |
PublicationDecade | 2020 |
PublicationTitle | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) |
PublicationTitleAbbrev | CVPR |
PublicationYear | 2020 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0003211698 |
Score | 2.503407 |
Snippet | For depth-based 3D action recognition, one essential issue is to represent 3D motion pattern effectively and efficiently. To this end, 3D dynamic voxel (3DV)... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 508 |
SubjectTerms | Dynamics Machine learning Pattern recognition Skeleton Solid modeling Three-dimensional displays Two dimensional displays |
Title | 3DV: 3D Dynamic Voxel for Action Recognition in Depth Video |
URI | https://ieeexplore.ieee.org/document/9157595 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NS8MwFA_bTp6mbuI3OXi0W5M0_dCTrI4hTMZwZbfRvKQ4lHZoC-Jfb5J2U8SDt6SXhDzCey_9fSB05ZIQSCaZY2pbxxPcuAEGriM580BQAtTqdE8f_cnCe1jyZQtd77gwSikLPlMDM7T_8mUBlXkqG0bE2EnyNmrrxq3mau3eU5juZPwobNhxxI2Go2Q2t_rrugukBsBlBUl_eKjYFDLuoul28Ro58jKoSjGAz1-6jP_d3T7qf5P18GyXhg5QS-WHqNtUl7i5u-89dMvi5AazGMe1CT1Oig_1inXRiu8suQHPt2AiPV7nOFab8hkna6mKPlqM759GE6exTnDW1GWlI3SXwKgQQZBFIcl8M9FXVXACqSKRoiEYkrfHfEkkpVkEQF3w0lCmhIOQ7Ah18iJXxwhHYUp1CMH1IPAkuCmEuobyU87TTOpm7gT1zFmsNrU6xqo5htO_P5-hPRONGmx1jjrlW6UudFovxaWN5xcNIp9Q |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3NT8IwFH9BPOgJFYzf9uDRwdqu-9CTYRJUIIQA4UbWj0Wi2YiOxPjX224DjfHgrd2lTV-a9173-wC4srEvcCypZWpby-HMuAF6tiUZdQQnWJBcp7s_cLsT53HGZhW43nBhlFI5-Ew1zTD_ly9TsTJPZa0AGztJtgXbOu8zXLC1Ni8qVPcybuCX_DhsB632dDjKFdh1H0gMhCuXJP3hopInkU4N-uvlC-zIS3OV8ab4_KXM-N_97UHjm66HhptEtA8VlRxArawvUXl73-twS8PpDaIhCgsbejRNP9Qr0mUrusvpDWi0hhPp8SJBoVpmz2i6kCptwKRzP253rdI8wVoQm2YW130CJZx7Xhz4OHbNRF9WzrCIFA4U8YWheTvUlVgSEgdCEFs4kS8jzASX9BCqSZqoI0CBHxEdRGE7wnOksCPh6yrKjRiLYqnbuWOom7OYLwt9jHl5DCd_f76Ene6435v3HgZPp7BrIlNAr86gmr2t1LlO8hm_yGP7BZGCopk |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=3DV%3A+3D+Dynamic+Voxel+for+Action+Recognition+in+Depth+Video&rft.au=Wang%2C+Yancheng&rft.au=Xiao%2C+Yang&rft.au=Xiong%2C+Fu&rft.au=Jiang%2C+Wenxiang&rft.date=2020-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=508&rft.epage=517&rft_id=info:doi/10.1109%2FCVPR42600.2020.00059&rft.externalDocID=9157595 |