Learning Heterogeneous Spatial-Temporal Context for Skeleton-Based Action Recognition

Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction modeling dominates the context aggregation and, therefore, is crucial for a graph-based convolution kernel to extract representative feature...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transaction on neural networks and learning systems Vol. 35; no. 9; pp. 12130 - 12141
Main Authors	Gao, Xuehao, Yang, Yang, Wu, Yang, Du, Shaoyi
Format	Journal Article
Language	English
Published	United States IEEE 01.09.2024
Subjects	Context awareness Convolutional neural networks Feature extraction Heterogeneous context learning multiscale graph Representation learning Skeleton skeleton-based action recognition spatial–temporal feature representation Spatiotemporal phenomena Topology
Online Access	Get full text

Cover

Loading…

Abstract	Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction modeling dominates the context aggregation and, therefore, is crucial for a graph-based convolution kernel to extract representative features. In this article, we introduce a closer look at a powerful graph convolution formulation to capture rich movement patterns from these skeleton-based graphs. Specifically, we propose a novel heterogeneous graph convolution (HetGCN) that can be considered as the middle ground between the extremes of (2 + 1)-D and 3-D graph convolution. The core observation of HetGCN is that multiple information flows are jointly intertwined in a 3-D convolution kernel, including spatial, temporal, and spatial-temporal cues. Since spatial and temporal information flows characterize different cues for action recognition, HetGCN first dynamically analyzes pairwise interactions between each node and its cross-space-time neighbors and then encourages heterogeneous context aggregation among them. Considering the HetGCN as a generic convolution formulation, we further develop it into two specific instantiations (i.e., intra-scale and inter-scale HetGCN) that significantly facilitate cross-space-time and cross-scale learning on skeleton graphs. By integrating these modules, we propose a strong human action recognition system that outperforms state-of-the-art methods with the accuracy of 93.1% on NTU-60 cross-subject (X-Sub) benchmark, 88.9% on NTU-120 X-Sub benchmark, and 38.4% on kinetics skeleton.
AbstractList	Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction modeling dominates the context aggregation and, therefore, is crucial for a graph-based convolution kernel to extract representative features. In this article, we introduce a closer look at a powerful graph convolution formulation to capture rich movement patterns from these skeleton-based graphs. Specifically, we propose a novel heterogeneous graph convolution (HetGCN) that can be considered as the middle ground between the extremes of (2 + 1)-D and 3-D graph convolution. The core observation of HetGCN is that multiple information flows are jointly intertwined in a 3-D convolution kernel, including spatial, temporal, and spatial-temporal cues. Since spatial and temporal information flows characterize different cues for action recognition, HetGCN first dynamically analyzes pairwise interactions between each node and its cross-space-time neighbors and then encourages heterogeneous context aggregation among them. Considering the HetGCN as a generic convolution formulation, we further develop it into two specific instantiations (i.e., intra-scale and inter-scale HetGCN) that significantly facilitate cross-space-time and cross-scale learning on skeleton graphs. By integrating these modules, we propose a strong human action recognition system that outperforms state-of-the-art methods with the accuracy of 93.1% on NTU-60 cross-subject (X-Sub) benchmark, 88.9% on NTU-120 X-Sub benchmark, and 38.4% on kinetics skeleton.Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction modeling dominates the context aggregation and, therefore, is crucial for a graph-based convolution kernel to extract representative features. In this article, we introduce a closer look at a powerful graph convolution formulation to capture rich movement patterns from these skeleton-based graphs. Specifically, we propose a novel heterogeneous graph convolution (HetGCN) that can be considered as the middle ground between the extremes of (2 + 1)-D and 3-D graph convolution. The core observation of HetGCN is that multiple information flows are jointly intertwined in a 3-D convolution kernel, including spatial, temporal, and spatial-temporal cues. Since spatial and temporal information flows characterize different cues for action recognition, HetGCN first dynamically analyzes pairwise interactions between each node and its cross-space-time neighbors and then encourages heterogeneous context aggregation among them. Considering the HetGCN as a generic convolution formulation, we further develop it into two specific instantiations (i.e., intra-scale and inter-scale HetGCN) that significantly facilitate cross-space-time and cross-scale learning on skeleton graphs. By integrating these modules, we propose a strong human action recognition system that outperforms state-of-the-art methods with the accuracy of 93.1% on NTU-60 cross-subject (X-Sub) benchmark, 88.9% on NTU-120 X-Sub benchmark, and 38.4% on kinetics skeleton. Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction modeling dominates the context aggregation and, therefore, is crucial for a graph-based convolution kernel to extract representative features. In this article, we introduce a closer look at a powerful graph convolution formulation to capture rich movement patterns from these skeleton-based graphs. Specifically, we propose a novel heterogeneous graph convolution (HetGCN) that can be considered as the middle ground between the extremes of (2 + 1)-D and 3-D graph convolution. The core observation of HetGCN is that multiple information flows are jointly intertwined in a 3-D convolution kernel, including spatial, temporal, and spatial-temporal cues. Since spatial and temporal information flows characterize different cues for action recognition, HetGCN first dynamically analyzes pairwise interactions between each node and its cross-space-time neighbors and then encourages heterogeneous context aggregation among them. Considering the HetGCN as a generic convolution formulation, we further develop it into two specific instantiations (i.e., intra-scale and inter-scale HetGCN) that significantly facilitate cross-space-time and cross-scale learning on skeleton graphs. By integrating these modules, we propose a strong human action recognition system that outperforms state-of-the-art methods with the accuracy of 93.1% on NTU-60 cross-subject (X-Sub) benchmark, 88.9% on NTU-120 X-Sub benchmark, and 38.4% on kinetics skeleton.
Author	Wu, Yang Gao, Xuehao Du, Shaoyi Yang, Yang
Author_xml	– sequence: 1 givenname: Xuehao orcidid: 0000-0003-3168-5770 surname: Gao fullname: Gao, Xuehao email: gaoxuehao.xjtu@gmail.com organization: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Xi'an Jiaotong University, Xi'an, China – sequence: 2 givenname: Yang orcidid: 0000-0001-8687-4427 surname: Yang fullname: Yang, Yang email: yyang@mail.xjtu.edu.cn organization: School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China – sequence: 3 givenname: Yang surname: Wu fullname: Wu, Yang email: dylanywu@tencent.com organization: Tencent AI Laboratory, Shenzhen, China – sequence: 4 givenname: Shaoyi orcidid: 0000-0002-7092-0596 surname: Du fullname: Du, Shaoyi email: dushaoyi@gmail.com organization: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Xi'an Jiaotong University, Xi'an, China
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/37030786$$D View this record in MEDLINE/PubMed
BookMark	eNp9kM9PwjAUxxuDEUT-AWPMjl6G_bGt2xGJignBRCDxtnTdG6mOFtuS6H_vBkiMB3t57_D5vr73OUcdbTQgdEnwkBCc3S5ms-l8SDFlQ0ZjSjg9QT1KEhpSlqadY89fu2jg3BtuXoLjJMrOUJdxzDBPkx5aTkFYrfQqmIAHa1agwWxdMN8Ir0QdLmC9MVbUwdhoD58-qIwN5u9Qgzc6vBMOymAkvTI6eAFpVlq1_QU6rUTtYHCofbR8uF-MJ-H0-fFpPJqGktHIh1EsWSppyVPIcFaUBQFSgeDNAQkWsqA4LmlRRGVWZQllUSYjLGkSR1wWEgRmfXSzn7ux5mMLzudr5STUtdhdkVOepZxELCUNen1At8Uaynxj1VrYr_xHRQPQPSCtcc5CdUQIzlvl-U553irPD8qbUPonJJUXrQJvhar_j17towoAfv2Fm2UZYd8ec47i
CODEN	ITNNAL
CitedBy_id	crossref_primary_10_1109_TMM_2024_3521774 crossref_primary_10_1109_TCSVT_2024_3491133
Cites_doi	10.1109/ICPR56361.2022.9956300 10.1109/CVPR52688.2022.00298 10.1609/aaai.v34i01.5438 10.1109/tnnls.2022.3201518 10.1109/TNNLS.2019.2935173 10.1109/CVPR.2018.00675 10.1109/CVPR52688.2022.01952 10.1109/CVPR.2019.00371 10.1109/ICCV48922.2021.01311 10.1109/TIP.2021.3104182 10.1109/CVPR.2018.00155 10.1109/CVPR42600.2020.00026 10.1109/ICCV48922.2021.01316 10.1609/aaai.v32i1.12328 10.1109/TITS.2021.3135251 10.1109/CVPR42600.2020.00029 10.1109/LRA.2021.3139369 10.1007/978-3-030-01246-5_7 10.1109/CVPR42600.2020.00119 10.1109/ICCV48922.2021.01127 10.1109/CVPR.2019.01230 10.1109/CVPR.2019.00810 10.1145/3343031.3351170 10.1109/TPAMI.2019.2916873 10.1109/TPAMI.2012.59 10.1109/CVPR.2017.143 10.1109/CVPR.2014.82 10.1609/aaai.v33i01.3301922 10.1109/CVPR42600.2020.00187 10.1609/aaai.v34i03.5652 10.1109/CVPR52688.2022.00300 10.1109/CVPR.2016.115 10.48550/ARXIV.1706.03762 10.1109/TIP.2021.3108708 10.1109/ICCV48922.2021.01341 10.1109/CVPR42600.2020.00022 10.24963/ijcai.2019/274 10.1109/CVPR52688.2022.01933 10.1145/3369318.3369325 10.1109/TMM.2021.3127040 10.1109/CVPR.2017.502 10.1109/CVPR42600.2020.01434 10.1109/CVPR52688.2022.01955 10.1109/TNNLS.2021.3061115
ContentType	Journal Article
DBID	97E RIA RIE AAYXX CITATION NPM 7X8
DOI	10.1109/TNNLS.2023.3252172
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef PubMed MEDLINE - Academic
DatabaseTitle	CrossRef PubMed MEDLINE - Academic
DatabaseTitleList	MEDLINE - Academic PubMed
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	2162-2388
EndPage	12141
ExternalDocumentID	37030786 10_1109_TNNLS_2023_3252172 10081331
Genre	orig-research Journal Article
GrantInformation_xml	– fundername: National Key Research and Development Program of China grantid: 2018AAA0102500
GroupedDBID	0R~ 4.4 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACIWK ACPRK AENEX AFRAH AGQYO AGSQL AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD IFIPE IPLJI JAVBF M43 MS~ O9- OCL PQQKQ RIA RIE RNS AAYXX CITATION RIG NPM 7X8
ID	FETCH-LOGICAL-c324t-45c38c2d78e909bdb1e1fea738860acb205d2bb4d9f962349c40c26547cbcea03
IEDL.DBID	RIE
ISSN	2162-237X 2162-2388
IngestDate	Fri Jul 11 08:36:41 EDT 2025 Thu Jan 02 22:38:33 EST 2025 Tue Jul 01 00:27:50 EDT 2025 Thu Apr 24 22:52:56 EDT 2025 Wed Aug 27 02:33:14 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Issue	9
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c324t-45c38c2d78e909bdb1e1fea738860acb205d2bb4d9f962349c40c26547cbcea03
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ORCID	0000-0003-3168-5770 0000-0001-8687-4427 0000-0002-7092-0596
PMID	37030786
PQID	2798714381
PQPubID	23479
PageCount	12
ParticipantIDs	crossref_primary_10_1109_TNNLS_2023_3252172 proquest_miscellaneous_2798714381 crossref_citationtrail_10_1109_TNNLS_2023_3252172 pubmed_primary_37030786 ieee_primary_10081331
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2024-09-01
PublicationDateYYYYMMDD	2024-09-01
PublicationDate_xml	– month: 09 year: 2024 text: 2024-09-01 day: 01
PublicationDecade	2020
PublicationPlace	United States
PublicationPlace_xml	– name: United States
PublicationTitle	IEEE transaction on neural networks and learning systems
PublicationTitleAbbrev	TNNLS
PublicationTitleAlternate	IEEE Trans Neural Netw Learn Syst
PublicationYear	2024
Publisher	IEEE
Publisher_xml	– name: IEEE
References	ref35 ref12 ref34 ref15 ref37 ref14 ref36 ref31 ref30 ref11 ref33 ref10 ref32 ref2 ref1 ref17 ref39 Gehring (ref13) ref16 ref38 ref19 ref24 ref46 ref23 ref45 ref26 ref25 ref20 ref42 ref41 ref22 ref44 ref21 ref43 ref28 ref27 ref29 ref8 ref7 ref9 ref4 ref3 ref6 ref5 Kay (ref18) 2017 ref40
References_xml	– ident: ref16 doi: 10.1109/ICPR56361.2022.9956300 – ident: ref10 doi: 10.1109/CVPR52688.2022.00298 – ident: ref35 doi: 10.1609/aaai.v34i01.5438 – ident: ref28 doi: 10.1109/tnnls.2022.3201518 – ident: ref45 doi: 10.1109/TNNLS.2019.2935173 – ident: ref37 doi: 10.1109/CVPR.2018.00675 – ident: ref46 doi: 10.1109/CVPR52688.2022.01952 – ident: ref20 doi: 10.1109/CVPR.2019.00371 – ident: ref4 doi: 10.1109/ICCV48922.2021.01311 – ident: ref7 doi: 10.1109/TIP.2021.3104182 – ident: ref40 doi: 10.1109/CVPR.2018.00155 – ident: ref6 doi: 10.1109/CVPR42600.2020.00026 – year: 2017 ident: ref18 article-title: The kinetics human action video dataset publication-title: arXiv:1705.06950 – ident: ref33 doi: 10.1109/ICCV48922.2021.01316 – ident: ref42 doi: 10.1609/aaai.v32i1.12328 – ident: ref41 doi: 10.1109/TITS.2021.3135251 – ident: ref21 doi: 10.1109/CVPR42600.2020.00029 – ident: ref29 doi: 10.1109/LRA.2021.3139369 – ident: ref34 doi: 10.1007/978-3-030-01246-5_7 – ident: ref43 doi: 10.1109/CVPR42600.2020.00119 – ident: ref9 doi: 10.1109/ICCV48922.2021.01127 – ident: ref32 doi: 10.1109/CVPR.2019.01230 – ident: ref31 doi: 10.1109/CVPR.2019.00810 – ident: ref11 doi: 10.1145/3343031.3351170 – ident: ref25 doi: 10.1109/TPAMI.2019.2916873 – ident: ref17 doi: 10.1109/TPAMI.2012.59 – ident: ref2 doi: 10.1109/CVPR.2017.143 – ident: ref39 doi: 10.1109/CVPR.2014.82 – ident: ref14 doi: 10.1609/aaai.v33i01.3301922 – ident: ref24 doi: 10.1109/CVPR42600.2020.00187 – ident: ref27 doi: 10.1609/aaai.v34i03.5652 – ident: ref15 doi: 10.1109/CVPR52688.2022.00300 – ident: ref30 doi: 10.1109/CVPR.2016.115 – ident: ref38 doi: 10.48550/ARXIV.1706.03762 – ident: ref22 doi: 10.1109/TIP.2021.3108708 – ident: ref23 doi: 10.1109/ICCV48922.2021.01341 – ident: ref26 doi: 10.1109/CVPR42600.2020.00022 – ident: ref1 doi: 10.24963/ijcai.2019/274 – ident: ref36 doi: 10.1109/CVPR52688.2022.01933 – ident: ref5 doi: 10.1145/3369318.3369325 – ident: ref12 doi: 10.1109/TMM.2021.3127040 – start-page: 1243 volume-title: Proc. 34th Int. Conf. Mach. Learn. ident: ref13 article-title: Convolutional sequence to sequence learning – ident: ref3 doi: 10.1109/CVPR.2017.502 – ident: ref44 doi: 10.1109/CVPR42600.2020.01434 – ident: ref8 doi: 10.1109/CVPR52688.2022.01955 – ident: ref19 doi: 10.1109/TNNLS.2021.3061115
SSID	ssj0000605649
Score	2.494722
Snippet	Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction...
SourceID	proquest pubmed crossref ieee
SourceType	Aggregation Database Index Database Enrichment Source Publisher
StartPage	12130
SubjectTerms	Context awareness Convolutional neural networks Feature extraction Heterogeneous context learning multiscale graph Representation learning Skeleton skeleton-based action recognition spatial–temporal feature representation Spatiotemporal phenomena Topology
Title	Learning Heterogeneous Spatial-Temporal Context for Skeleton-Based Action Recognition
URI	https://ieeexplore.ieee.org/document/10081331 https://www.ncbi.nlm.nih.gov/pubmed/37030786 https://www.proquest.com/docview/2798714381
Volume	35
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwELWgB9QLZSlQNhmJG0rq2Nl8LIiqQpADbaXeIm_h0CpBkEiIr8d2kqpCKuKWg7N5xpo39sx7ANxKRIgUJHRobCXMiO9QxJATkMwPeYSUZ7tcX5JwMvefFsGiaVa3vTBKKVt8plxzac_yZSEqs1U2NEQ0OqfSyc6uztzqZq31hgrSwDy0cBd7IXYwiRZtkwyiw1mSPE9doxXuEhwYVaYu2COWDcu0UW_EJCuysh1v2rgz7oGk_eK63GTpViV3xfcvMsd__9IB2G8QKBzVLnMIdlR-BHqtugNsFvsxmDfUq29wYkpmCu1pqqg-oREx1k7rzGpSqxW0BFdfJdTwF06XOowZVeJ7HR0lHNmuCfjaVikVeR_Mx4-zh4nTiDA4QmOt0vEDQWKBZRQriiiX3FNeplhE4jhETHCMAok59yXNqIZSPhU-EthIGgsuFEPkBHTyIldnACrOmEF8GTOkZoF2iYB5QRZiKhnhEg2A15ohFQ1DuRHKWKU2U0E0tVZMjRXTxooDcLe-573m5_hzdN-YYGNkPfsDcNOaO9XLy5yZMDunKY5obCXi9ZjT2g_Wd7fuc77lqRegq1_u1xVpl6BTflTqSkOYkl9b1_0B1T_o1A
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1Nb9QwEB21RaK90AIFttBiJDihpI4dJ_GBQ4FWW7rdA92V9pb6KxxabRDNCuh_4a_w2xg7yapCKrdK3HKwrcQztt_EM-8BvLaUc2t4FskiSJjxNJJU0UjwKs10Tl0SqlxPx9lwmn6aidkK_FrWwjjnQvKZi_1juMu3tVn4X2X7nogGY6qky6E8cT-_Y4R29e74I5rzDWNHh5MPw6gTEYgMYoUmSoXhhWE2L5ykUluduKRyKudFkVFlNKPCMq1TKyuJUCCVJqWGeUleo41TlOO4q3APgYZgbXnY8hcOxVAgCwCbJRmLGM9nfVkOlfuT8Xh0Fnt18pgz4XWgNuA-D_xbvnD7xikYZF1uR7jhpDvahN_9HLUJLhfxotGxuf6LPvK_ncQteNBhbHLQLoqHsOLmj2Cz168g3Xb2GKYduewXMvRJQTWuJVcvroiXacZlGU1a2q5LEii8fjQEAT45u8CD2usuv8fz35KDUBdCPvd5WPV8G6Z38nFPYG1ez90zIE4r5TFtpTxtm0CnFyoRVcakVVxbOoCkN3tpOg52LwVyWYZYjMoyeE3pvabsvGYAb5d9vrYMJP9sve1NfqNla-0BvOrdq8QNxN8KqTCnJcslBs2e6W0AT1u_W_bu3XXnllFfwvpwcjoqR8fjk-ewgS-Stvl3L2Ct-bZwuwjYGr0Xlg2B87t2sT9yh0cW
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Learning+Heterogeneous+Spatial-Temporal+Context+for+Skeleton-Based+Action+Recognition&rft.jtitle=IEEE+transaction+on+neural+networks+and+learning+systems&rft.au=Gao%2C+Xuehao&rft.au=Yang%2C+Yang&rft.au=Wu%2C+Yang&rft.au=Du%2C+Shaoyi&rft.date=2024-09-01&rft.eissn=2162-2388&rft.volume=35&rft.issue=9&rft.spage=12130&rft_id=info:doi/10.1109%2FTNNLS.2023.3252172&rft_id=info%3Apmid%2F37030786&rft.externalDocID=37030786
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2162-237X&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2162-237X&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2162-237X&client=summon