Learning Heterogeneous Spatial-Temporal Context for Skeleton-Based Action Recognition
Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction modeling dominates the context aggregation and, therefore, is crucial for a graph-based convolution kernel to extract representative feature...
Saved in:
Published in | IEEE transaction on neural networks and learning systems Vol. 35; no. 9; pp. 12130 - 12141 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
United States
IEEE
01.09.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction modeling dominates the context aggregation and, therefore, is crucial for a graph-based convolution kernel to extract representative features. In this article, we introduce a closer look at a powerful graph convolution formulation to capture rich movement patterns from these skeleton-based graphs. Specifically, we propose a novel heterogeneous graph convolution (HetGCN) that can be considered as the middle ground between the extremes of (2 + 1)-D and 3-D graph convolution. The core observation of HetGCN is that multiple information flows are jointly intertwined in a 3-D convolution kernel, including spatial, temporal, and spatial-temporal cues. Since spatial and temporal information flows characterize different cues for action recognition, HetGCN first dynamically analyzes pairwise interactions between each node and its cross-space-time neighbors and then encourages heterogeneous context aggregation among them. Considering the HetGCN as a generic convolution formulation, we further develop it into two specific instantiations (i.e., intra-scale and inter-scale HetGCN) that significantly facilitate cross-space-time and cross-scale learning on skeleton graphs. By integrating these modules, we propose a strong human action recognition system that outperforms state-of-the-art methods with the accuracy of 93.1% on NTU-60 cross-subject (X-Sub) benchmark, 88.9% on NTU-120 X-Sub benchmark, and 38.4% on kinetics skeleton. |
---|---|
AbstractList | Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction modeling dominates the context aggregation and, therefore, is crucial for a graph-based convolution kernel to extract representative features. In this article, we introduce a closer look at a powerful graph convolution formulation to capture rich movement patterns from these skeleton-based graphs. Specifically, we propose a novel heterogeneous graph convolution (HetGCN) that can be considered as the middle ground between the extremes of (2 + 1)-D and 3-D graph convolution. The core observation of HetGCN is that multiple information flows are jointly intertwined in a 3-D convolution kernel, including spatial, temporal, and spatial-temporal cues. Since spatial and temporal information flows characterize different cues for action recognition, HetGCN first dynamically analyzes pairwise interactions between each node and its cross-space-time neighbors and then encourages heterogeneous context aggregation among them. Considering the HetGCN as a generic convolution formulation, we further develop it into two specific instantiations (i.e., intra-scale and inter-scale HetGCN) that significantly facilitate cross-space-time and cross-scale learning on skeleton graphs. By integrating these modules, we propose a strong human action recognition system that outperforms state-of-the-art methods with the accuracy of 93.1% on NTU-60 cross-subject (X-Sub) benchmark, 88.9% on NTU-120 X-Sub benchmark, and 38.4% on kinetics skeleton.Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction modeling dominates the context aggregation and, therefore, is crucial for a graph-based convolution kernel to extract representative features. In this article, we introduce a closer look at a powerful graph convolution formulation to capture rich movement patterns from these skeleton-based graphs. Specifically, we propose a novel heterogeneous graph convolution (HetGCN) that can be considered as the middle ground between the extremes of (2 + 1)-D and 3-D graph convolution. The core observation of HetGCN is that multiple information flows are jointly intertwined in a 3-D convolution kernel, including spatial, temporal, and spatial-temporal cues. Since spatial and temporal information flows characterize different cues for action recognition, HetGCN first dynamically analyzes pairwise interactions between each node and its cross-space-time neighbors and then encourages heterogeneous context aggregation among them. Considering the HetGCN as a generic convolution formulation, we further develop it into two specific instantiations (i.e., intra-scale and inter-scale HetGCN) that significantly facilitate cross-space-time and cross-scale learning on skeleton graphs. By integrating these modules, we propose a strong human action recognition system that outperforms state-of-the-art methods with the accuracy of 93.1% on NTU-60 cross-subject (X-Sub) benchmark, 88.9% on NTU-120 X-Sub benchmark, and 38.4% on kinetics skeleton. Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction modeling dominates the context aggregation and, therefore, is crucial for a graph-based convolution kernel to extract representative features. In this article, we introduce a closer look at a powerful graph convolution formulation to capture rich movement patterns from these skeleton-based graphs. Specifically, we propose a novel heterogeneous graph convolution (HetGCN) that can be considered as the middle ground between the extremes of (2 + 1)-D and 3-D graph convolution. The core observation of HetGCN is that multiple information flows are jointly intertwined in a 3-D convolution kernel, including spatial, temporal, and spatial-temporal cues. Since spatial and temporal information flows characterize different cues for action recognition, HetGCN first dynamically analyzes pairwise interactions between each node and its cross-space-time neighbors and then encourages heterogeneous context aggregation among them. Considering the HetGCN as a generic convolution formulation, we further develop it into two specific instantiations (i.e., intra-scale and inter-scale HetGCN) that significantly facilitate cross-space-time and cross-scale learning on skeleton graphs. By integrating these modules, we propose a strong human action recognition system that outperforms state-of-the-art methods with the accuracy of 93.1% on NTU-60 cross-subject (X-Sub) benchmark, 88.9% on NTU-120 X-Sub benchmark, and 38.4% on kinetics skeleton. |
Author | Wu, Yang Gao, Xuehao Du, Shaoyi Yang, Yang |
Author_xml | – sequence: 1 givenname: Xuehao orcidid: 0000-0003-3168-5770 surname: Gao fullname: Gao, Xuehao email: gaoxuehao.xjtu@gmail.com organization: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Xi'an Jiaotong University, Xi'an, China – sequence: 2 givenname: Yang orcidid: 0000-0001-8687-4427 surname: Yang fullname: Yang, Yang email: yyang@mail.xjtu.edu.cn organization: School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China – sequence: 3 givenname: Yang surname: Wu fullname: Wu, Yang email: dylanywu@tencent.com organization: Tencent AI Laboratory, Shenzhen, China – sequence: 4 givenname: Shaoyi orcidid: 0000-0002-7092-0596 surname: Du fullname: Du, Shaoyi email: dushaoyi@gmail.com organization: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Xi'an Jiaotong University, Xi'an, China |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/37030786$$D View this record in MEDLINE/PubMed |
BookMark | eNp9kM9PwjAUxxuDEUT-AWPMjl6G_bGt2xGJignBRCDxtnTdG6mOFtuS6H_vBkiMB3t57_D5vr73OUcdbTQgdEnwkBCc3S5ms-l8SDFlQ0ZjSjg9QT1KEhpSlqadY89fu2jg3BtuXoLjJMrOUJdxzDBPkx5aTkFYrfQqmIAHa1agwWxdMN8Ir0QdLmC9MVbUwdhoD58-qIwN5u9Qgzc6vBMOymAkvTI6eAFpVlq1_QU6rUTtYHCofbR8uF-MJ-H0-fFpPJqGktHIh1EsWSppyVPIcFaUBQFSgeDNAQkWsqA4LmlRRGVWZQllUSYjLGkSR1wWEgRmfXSzn7ux5mMLzudr5STUtdhdkVOepZxELCUNen1At8Uaynxj1VrYr_xHRQPQPSCtcc5CdUQIzlvl-U553irPD8qbUPonJJUXrQJvhar_j17towoAfv2Fm2UZYd8ec47i |
CODEN | ITNNAL |
CitedBy_id | crossref_primary_10_1109_TMM_2024_3521774 crossref_primary_10_1109_TCSVT_2024_3491133 |
Cites_doi | 10.1109/ICPR56361.2022.9956300 10.1109/CVPR52688.2022.00298 10.1609/aaai.v34i01.5438 10.1109/tnnls.2022.3201518 10.1109/TNNLS.2019.2935173 10.1109/CVPR.2018.00675 10.1109/CVPR52688.2022.01952 10.1109/CVPR.2019.00371 10.1109/ICCV48922.2021.01311 10.1109/TIP.2021.3104182 10.1109/CVPR.2018.00155 10.1109/CVPR42600.2020.00026 10.1109/ICCV48922.2021.01316 10.1609/aaai.v32i1.12328 10.1109/TITS.2021.3135251 10.1109/CVPR42600.2020.00029 10.1109/LRA.2021.3139369 10.1007/978-3-030-01246-5_7 10.1109/CVPR42600.2020.00119 10.1109/ICCV48922.2021.01127 10.1109/CVPR.2019.01230 10.1109/CVPR.2019.00810 10.1145/3343031.3351170 10.1109/TPAMI.2019.2916873 10.1109/TPAMI.2012.59 10.1109/CVPR.2017.143 10.1109/CVPR.2014.82 10.1609/aaai.v33i01.3301922 10.1109/CVPR42600.2020.00187 10.1609/aaai.v34i03.5652 10.1109/CVPR52688.2022.00300 10.1109/CVPR.2016.115 10.48550/ARXIV.1706.03762 10.1109/TIP.2021.3108708 10.1109/ICCV48922.2021.01341 10.1109/CVPR42600.2020.00022 10.24963/ijcai.2019/274 10.1109/CVPR52688.2022.01933 10.1145/3369318.3369325 10.1109/TMM.2021.3127040 10.1109/CVPR.2017.502 10.1109/CVPR42600.2020.01434 10.1109/CVPR52688.2022.01955 10.1109/TNNLS.2021.3061115 |
ContentType | Journal Article |
DBID | 97E RIA RIE AAYXX CITATION NPM 7X8 |
DOI | 10.1109/TNNLS.2023.3252172 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef PubMed MEDLINE - Academic |
DatabaseTitle | CrossRef PubMed MEDLINE - Academic |
DatabaseTitleList | MEDLINE - Academic PubMed |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 2162-2388 |
EndPage | 12141 |
ExternalDocumentID | 37030786 10_1109_TNNLS_2023_3252172 10081331 |
Genre | orig-research Journal Article |
GrantInformation_xml | – fundername: National Key Research and Development Program of China grantid: 2018AAA0102500 |
GroupedDBID | 0R~ 4.4 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACIWK ACPRK AENEX AFRAH AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD IFIPE IPLJI JAVBF M43 MS~ O9- OCL PQQKQ RIA RIE RNS AAYXX CITATION RIG NPM 7X8 |
ID | FETCH-LOGICAL-c324t-45c38c2d78e909bdb1e1fea738860acb205d2bb4d9f962349c40c26547cbcea03 |
IEDL.DBID | RIE |
ISSN | 2162-237X 2162-2388 |
IngestDate | Fri Jul 11 08:36:41 EDT 2025 Thu Jan 02 22:38:33 EST 2025 Tue Jul 01 00:27:50 EDT 2025 Thu Apr 24 22:52:56 EDT 2025 Wed Aug 27 02:33:14 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Issue | 9 |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c324t-45c38c2d78e909bdb1e1fea738860acb205d2bb4d9f962349c40c26547cbcea03 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ORCID | 0000-0003-3168-5770 0000-0001-8687-4427 0000-0002-7092-0596 |
PMID | 37030786 |
PQID | 2798714381 |
PQPubID | 23479 |
PageCount | 12 |
ParticipantIDs | crossref_primary_10_1109_TNNLS_2023_3252172 proquest_miscellaneous_2798714381 crossref_citationtrail_10_1109_TNNLS_2023_3252172 pubmed_primary_37030786 ieee_primary_10081331 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2024-09-01 |
PublicationDateYYYYMMDD | 2024-09-01 |
PublicationDate_xml | – month: 09 year: 2024 text: 2024-09-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States |
PublicationTitle | IEEE transaction on neural networks and learning systems |
PublicationTitleAbbrev | TNNLS |
PublicationTitleAlternate | IEEE Trans Neural Netw Learn Syst |
PublicationYear | 2024 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
References | ref35 ref12 ref34 ref15 ref37 ref14 ref36 ref31 ref30 ref11 ref33 ref10 ref32 ref2 ref1 ref17 ref39 Gehring (ref13) ref16 ref38 ref19 ref24 ref46 ref23 ref45 ref26 ref25 ref20 ref42 ref41 ref22 ref44 ref21 ref43 ref28 ref27 ref29 ref8 ref7 ref9 ref4 ref3 ref6 ref5 Kay (ref18) 2017 ref40 |
References_xml | – ident: ref16 doi: 10.1109/ICPR56361.2022.9956300 – ident: ref10 doi: 10.1109/CVPR52688.2022.00298 – ident: ref35 doi: 10.1609/aaai.v34i01.5438 – ident: ref28 doi: 10.1109/tnnls.2022.3201518 – ident: ref45 doi: 10.1109/TNNLS.2019.2935173 – ident: ref37 doi: 10.1109/CVPR.2018.00675 – ident: ref46 doi: 10.1109/CVPR52688.2022.01952 – ident: ref20 doi: 10.1109/CVPR.2019.00371 – ident: ref4 doi: 10.1109/ICCV48922.2021.01311 – ident: ref7 doi: 10.1109/TIP.2021.3104182 – ident: ref40 doi: 10.1109/CVPR.2018.00155 – ident: ref6 doi: 10.1109/CVPR42600.2020.00026 – year: 2017 ident: ref18 article-title: The kinetics human action video dataset publication-title: arXiv:1705.06950 – ident: ref33 doi: 10.1109/ICCV48922.2021.01316 – ident: ref42 doi: 10.1609/aaai.v32i1.12328 – ident: ref41 doi: 10.1109/TITS.2021.3135251 – ident: ref21 doi: 10.1109/CVPR42600.2020.00029 – ident: ref29 doi: 10.1109/LRA.2021.3139369 – ident: ref34 doi: 10.1007/978-3-030-01246-5_7 – ident: ref43 doi: 10.1109/CVPR42600.2020.00119 – ident: ref9 doi: 10.1109/ICCV48922.2021.01127 – ident: ref32 doi: 10.1109/CVPR.2019.01230 – ident: ref31 doi: 10.1109/CVPR.2019.00810 – ident: ref11 doi: 10.1145/3343031.3351170 – ident: ref25 doi: 10.1109/TPAMI.2019.2916873 – ident: ref17 doi: 10.1109/TPAMI.2012.59 – ident: ref2 doi: 10.1109/CVPR.2017.143 – ident: ref39 doi: 10.1109/CVPR.2014.82 – ident: ref14 doi: 10.1609/aaai.v33i01.3301922 – ident: ref24 doi: 10.1109/CVPR42600.2020.00187 – ident: ref27 doi: 10.1609/aaai.v34i03.5652 – ident: ref15 doi: 10.1109/CVPR52688.2022.00300 – ident: ref30 doi: 10.1109/CVPR.2016.115 – ident: ref38 doi: 10.48550/ARXIV.1706.03762 – ident: ref22 doi: 10.1109/TIP.2021.3108708 – ident: ref23 doi: 10.1109/ICCV48922.2021.01341 – ident: ref26 doi: 10.1109/CVPR42600.2020.00022 – ident: ref1 doi: 10.24963/ijcai.2019/274 – ident: ref36 doi: 10.1109/CVPR52688.2022.01933 – ident: ref5 doi: 10.1145/3369318.3369325 – ident: ref12 doi: 10.1109/TMM.2021.3127040 – start-page: 1243 volume-title: Proc. 34th Int. Conf. Mach. Learn. ident: ref13 article-title: Convolutional sequence to sequence learning – ident: ref3 doi: 10.1109/CVPR.2017.502 – ident: ref44 doi: 10.1109/CVPR42600.2020.01434 – ident: ref8 doi: 10.1109/CVPR52688.2022.01955 – ident: ref19 doi: 10.1109/TNNLS.2021.3061115 |
SSID | ssj0000605649 |
Score | 2.494722 |
Snippet | Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction... |
SourceID | proquest pubmed crossref ieee |
SourceType | Aggregation Database Index Database Enrichment Source Publisher |
StartPage | 12130 |
SubjectTerms | Context awareness Convolutional neural networks Feature extraction Heterogeneous context learning multiscale graph Representation learning Skeleton skeleton-based action recognition spatial–temporal feature representation Spatiotemporal phenomena Topology |
Title | Learning Heterogeneous Spatial-Temporal Context for Skeleton-Based Action Recognition |
URI | https://ieeexplore.ieee.org/document/10081331 https://www.ncbi.nlm.nih.gov/pubmed/37030786 https://www.proquest.com/docview/2798714381 |
Volume | 35 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwELWgB9QLZSlQNhmJG0rq2Nl8LIiqQpADbaXeIm_h0CpBkEiIr8d2kqpCKuKWg7N5xpo39sx7ANxKRIgUJHRobCXMiO9QxJATkMwPeYSUZ7tcX5JwMvefFsGiaVa3vTBKKVt8plxzac_yZSEqs1U2NEQ0OqfSyc6uztzqZq31hgrSwDy0cBd7IXYwiRZtkwyiw1mSPE9doxXuEhwYVaYu2COWDcu0UW_EJCuysh1v2rgz7oGk_eK63GTpViV3xfcvMsd__9IB2G8QKBzVLnMIdlR-BHqtugNsFvsxmDfUq29wYkpmCu1pqqg-oREx1k7rzGpSqxW0BFdfJdTwF06XOowZVeJ7HR0lHNmuCfjaVikVeR_Mx4-zh4nTiDA4QmOt0vEDQWKBZRQriiiX3FNeplhE4jhETHCMAok59yXNqIZSPhU-EthIGgsuFEPkBHTyIldnACrOmEF8GTOkZoF2iYB5QRZiKhnhEg2A15ohFQ1DuRHKWKU2U0E0tVZMjRXTxooDcLe-573m5_hzdN-YYGNkPfsDcNOaO9XLy5yZMDunKY5obCXi9ZjT2g_Wd7fuc77lqRegq1_u1xVpl6BTflTqSkOYkl9b1_0B1T_o1A |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1Nb9QwEB21RaK90AIFttBiJDihpI4dJ_GBQ4FWW7rdA92V9pb6KxxabRDNCuh_4a_w2xg7yapCKrdK3HKwrcQztt_EM-8BvLaUc2t4FskiSJjxNJJU0UjwKs10Tl0SqlxPx9lwmn6aidkK_FrWwjjnQvKZi_1juMu3tVn4X2X7nogGY6qky6E8cT-_Y4R29e74I5rzDWNHh5MPw6gTEYgMYoUmSoXhhWE2L5ykUluduKRyKudFkVFlNKPCMq1TKyuJUCCVJqWGeUleo41TlOO4q3APgYZgbXnY8hcOxVAgCwCbJRmLGM9nfVkOlfuT8Xh0Fnt18pgz4XWgNuA-D_xbvnD7xikYZF1uR7jhpDvahN_9HLUJLhfxotGxuf6LPvK_ncQteNBhbHLQLoqHsOLmj2Cz168g3Xb2GKYduewXMvRJQTWuJVcvroiXacZlGU1a2q5LEii8fjQEAT45u8CD2usuv8fz35KDUBdCPvd5WPV8G6Z38nFPYG1ez90zIE4r5TFtpTxtm0CnFyoRVcakVVxbOoCkN3tpOg52LwVyWYZYjMoyeE3pvabsvGYAb5d9vrYMJP9sve1NfqNla-0BvOrdq8QNxN8KqTCnJcslBs2e6W0AT1u_W_bu3XXnllFfwvpwcjoqR8fjk-ewgS-Stvl3L2Ct-bZwuwjYGr0Xlg2B87t2sT9yh0cW |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Learning+Heterogeneous+Spatial-Temporal+Context+for+Skeleton-Based+Action+Recognition&rft.jtitle=IEEE+transaction+on+neural+networks+and+learning+systems&rft.au=Gao%2C+Xuehao&rft.au=Yang%2C+Yang&rft.au=Wu%2C+Yang&rft.au=Du%2C+Shaoyi&rft.date=2024-09-01&rft.eissn=2162-2388&rft.volume=35&rft.issue=9&rft.spage=12130&rft_id=info:doi/10.1109%2FTNNLS.2023.3252172&rft_id=info%3Apmid%2F37030786&rft.externalDocID=37030786 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2162-237X&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2162-237X&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2162-237X&client=summon |