Learning Heterogeneous Spatial-Temporal Context for Skeleton-Based Action Recognition

Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction modeling dominates the context aggregation and, therefore, is crucial for a graph-based convolution kernel to extract representative feature...

Full description

Saved in:
Bibliographic Details
Published inIEEE transaction on neural networks and learning systems Vol. 35; no. 9; pp. 12130 - 12141
Main Authors Gao, Xuehao, Yang, Yang, Wu, Yang, Du, Shaoyi
Format Journal Article
LanguageEnglish
Published United States IEEE 01.09.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction modeling dominates the context aggregation and, therefore, is crucial for a graph-based convolution kernel to extract representative features. In this article, we introduce a closer look at a powerful graph convolution formulation to capture rich movement patterns from these skeleton-based graphs. Specifically, we propose a novel heterogeneous graph convolution (HetGCN) that can be considered as the middle ground between the extremes of (2 + 1)-D and 3-D graph convolution. The core observation of HetGCN is that multiple information flows are jointly intertwined in a 3-D convolution kernel, including spatial, temporal, and spatial-temporal cues. Since spatial and temporal information flows characterize different cues for action recognition, HetGCN first dynamically analyzes pairwise interactions between each node and its cross-space-time neighbors and then encourages heterogeneous context aggregation among them. Considering the HetGCN as a generic convolution formulation, we further develop it into two specific instantiations (i.e., intra-scale and inter-scale HetGCN) that significantly facilitate cross-space-time and cross-scale learning on skeleton graphs. By integrating these modules, we propose a strong human action recognition system that outperforms state-of-the-art methods with the accuracy of 93.1% on NTU-60 cross-subject (X-Sub) benchmark, 88.9% on NTU-120 X-Sub benchmark, and 38.4% on kinetics skeleton.
AbstractList Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction modeling dominates the context aggregation and, therefore, is crucial for a graph-based convolution kernel to extract representative features. In this article, we introduce a closer look at a powerful graph convolution formulation to capture rich movement patterns from these skeleton-based graphs. Specifically, we propose a novel heterogeneous graph convolution (HetGCN) that can be considered as the middle ground between the extremes of (2 + 1)-D and 3-D graph convolution. The core observation of HetGCN is that multiple information flows are jointly intertwined in a 3-D convolution kernel, including spatial, temporal, and spatial-temporal cues. Since spatial and temporal information flows characterize different cues for action recognition, HetGCN first dynamically analyzes pairwise interactions between each node and its cross-space-time neighbors and then encourages heterogeneous context aggregation among them. Considering the HetGCN as a generic convolution formulation, we further develop it into two specific instantiations (i.e., intra-scale and inter-scale HetGCN) that significantly facilitate cross-space-time and cross-scale learning on skeleton graphs. By integrating these modules, we propose a strong human action recognition system that outperforms state-of-the-art methods with the accuracy of 93.1% on NTU-60 cross-subject (X-Sub) benchmark, 88.9% on NTU-120 X-Sub benchmark, and 38.4% on kinetics skeleton.Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction modeling dominates the context aggregation and, therefore, is crucial for a graph-based convolution kernel to extract representative features. In this article, we introduce a closer look at a powerful graph convolution formulation to capture rich movement patterns from these skeleton-based graphs. Specifically, we propose a novel heterogeneous graph convolution (HetGCN) that can be considered as the middle ground between the extremes of (2 + 1)-D and 3-D graph convolution. The core observation of HetGCN is that multiple information flows are jointly intertwined in a 3-D convolution kernel, including spatial, temporal, and spatial-temporal cues. Since spatial and temporal information flows characterize different cues for action recognition, HetGCN first dynamically analyzes pairwise interactions between each node and its cross-space-time neighbors and then encourages heterogeneous context aggregation among them. Considering the HetGCN as a generic convolution formulation, we further develop it into two specific instantiations (i.e., intra-scale and inter-scale HetGCN) that significantly facilitate cross-space-time and cross-scale learning on skeleton graphs. By integrating these modules, we propose a strong human action recognition system that outperforms state-of-the-art methods with the accuracy of 93.1% on NTU-60 cross-subject (X-Sub) benchmark, 88.9% on NTU-120 X-Sub benchmark, and 38.4% on kinetics skeleton.
Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction modeling dominates the context aggregation and, therefore, is crucial for a graph-based convolution kernel to extract representative features. In this article, we introduce a closer look at a powerful graph convolution formulation to capture rich movement patterns from these skeleton-based graphs. Specifically, we propose a novel heterogeneous graph convolution (HetGCN) that can be considered as the middle ground between the extremes of (2 + 1)-D and 3-D graph convolution. The core observation of HetGCN is that multiple information flows are jointly intertwined in a 3-D convolution kernel, including spatial, temporal, and spatial-temporal cues. Since spatial and temporal information flows characterize different cues for action recognition, HetGCN first dynamically analyzes pairwise interactions between each node and its cross-space-time neighbors and then encourages heterogeneous context aggregation among them. Considering the HetGCN as a generic convolution formulation, we further develop it into two specific instantiations (i.e., intra-scale and inter-scale HetGCN) that significantly facilitate cross-space-time and cross-scale learning on skeleton graphs. By integrating these modules, we propose a strong human action recognition system that outperforms state-of-the-art methods with the accuracy of 93.1% on NTU-60 cross-subject (X-Sub) benchmark, 88.9% on NTU-120 X-Sub benchmark, and 38.4% on kinetics skeleton.
Author Wu, Yang
Gao, Xuehao
Du, Shaoyi
Yang, Yang
Author_xml – sequence: 1
  givenname: Xuehao
  orcidid: 0000-0003-3168-5770
  surname: Gao
  fullname: Gao, Xuehao
  email: gaoxuehao.xjtu@gmail.com
  organization: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Xi'an Jiaotong University, Xi'an, China
– sequence: 2
  givenname: Yang
  orcidid: 0000-0001-8687-4427
  surname: Yang
  fullname: Yang, Yang
  email: yyang@mail.xjtu.edu.cn
  organization: School of Electronic and Information Engineering, Xi'an Jiaotong University, Xi'an, China
– sequence: 3
  givenname: Yang
  surname: Wu
  fullname: Wu, Yang
  email: dylanywu@tencent.com
  organization: Tencent AI Laboratory, Shenzhen, China
– sequence: 4
  givenname: Shaoyi
  orcidid: 0000-0002-7092-0596
  surname: Du
  fullname: Du, Shaoyi
  email: dushaoyi@gmail.com
  organization: National Key Laboratory of Human-Machine Hybrid Augmented Intelligence, Xi'an Jiaotong University, Xi'an, China
BackLink https://www.ncbi.nlm.nih.gov/pubmed/37030786$$D View this record in MEDLINE/PubMed
BookMark eNp9kM9PwjAUxxuDEUT-AWPMjl6G_bGt2xGJignBRCDxtnTdG6mOFtuS6H_vBkiMB3t57_D5vr73OUcdbTQgdEnwkBCc3S5ms-l8SDFlQ0ZjSjg9QT1KEhpSlqadY89fu2jg3BtuXoLjJMrOUJdxzDBPkx5aTkFYrfQqmIAHa1agwWxdMN8Ir0QdLmC9MVbUwdhoD58-qIwN5u9Qgzc6vBMOymAkvTI6eAFpVlq1_QU6rUTtYHCofbR8uF-MJ-H0-fFpPJqGktHIh1EsWSppyVPIcFaUBQFSgeDNAQkWsqA4LmlRRGVWZQllUSYjLGkSR1wWEgRmfXSzn7ux5mMLzudr5STUtdhdkVOepZxELCUNen1At8Uaynxj1VrYr_xHRQPQPSCtcc5CdUQIzlvl-U553irPD8qbUPonJJUXrQJvhar_j17towoAfv2Fm2UZYd8ec47i
CODEN ITNNAL
CitedBy_id crossref_primary_10_1109_TMM_2024_3521774
crossref_primary_10_1109_TCSVT_2024_3491133
Cites_doi 10.1109/ICPR56361.2022.9956300
10.1109/CVPR52688.2022.00298
10.1609/aaai.v34i01.5438
10.1109/tnnls.2022.3201518
10.1109/TNNLS.2019.2935173
10.1109/CVPR.2018.00675
10.1109/CVPR52688.2022.01952
10.1109/CVPR.2019.00371
10.1109/ICCV48922.2021.01311
10.1109/TIP.2021.3104182
10.1109/CVPR.2018.00155
10.1109/CVPR42600.2020.00026
10.1109/ICCV48922.2021.01316
10.1609/aaai.v32i1.12328
10.1109/TITS.2021.3135251
10.1109/CVPR42600.2020.00029
10.1109/LRA.2021.3139369
10.1007/978-3-030-01246-5_7
10.1109/CVPR42600.2020.00119
10.1109/ICCV48922.2021.01127
10.1109/CVPR.2019.01230
10.1109/CVPR.2019.00810
10.1145/3343031.3351170
10.1109/TPAMI.2019.2916873
10.1109/TPAMI.2012.59
10.1109/CVPR.2017.143
10.1109/CVPR.2014.82
10.1609/aaai.v33i01.3301922
10.1109/CVPR42600.2020.00187
10.1609/aaai.v34i03.5652
10.1109/CVPR52688.2022.00300
10.1109/CVPR.2016.115
10.48550/ARXIV.1706.03762
10.1109/TIP.2021.3108708
10.1109/ICCV48922.2021.01341
10.1109/CVPR42600.2020.00022
10.24963/ijcai.2019/274
10.1109/CVPR52688.2022.01933
10.1145/3369318.3369325
10.1109/TMM.2021.3127040
10.1109/CVPR.2017.502
10.1109/CVPR42600.2020.01434
10.1109/CVPR52688.2022.01955
10.1109/TNNLS.2021.3061115
ContentType Journal Article
DBID 97E
RIA
RIE
AAYXX
CITATION
NPM
7X8
DOI 10.1109/TNNLS.2023.3252172
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
PubMed
MEDLINE - Academic
DatabaseTitle CrossRef
PubMed
MEDLINE - Academic
DatabaseTitleList MEDLINE - Academic
PubMed

Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 2162-2388
EndPage 12141
ExternalDocumentID 37030786
10_1109_TNNLS_2023_3252172
10081331
Genre orig-research
Journal Article
GrantInformation_xml – fundername: National Key Research and Development Program of China
  grantid: 2018AAA0102500
GroupedDBID 0R~
4.4
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACIWK
ACPRK
AENEX
AFRAH
AGQYO
AGSQL
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
IFIPE
IPLJI
JAVBF
M43
MS~
O9-
OCL
PQQKQ
RIA
RIE
RNS
AAYXX
CITATION
RIG
NPM
7X8
ID FETCH-LOGICAL-c324t-45c38c2d78e909bdb1e1fea738860acb205d2bb4d9f962349c40c26547cbcea03
IEDL.DBID RIE
ISSN 2162-237X
2162-2388
IngestDate Fri Jul 11 08:36:41 EDT 2025
Thu Jan 02 22:38:33 EST 2025
Tue Jul 01 00:27:50 EDT 2025
Thu Apr 24 22:52:56 EDT 2025
Wed Aug 27 02:33:14 EDT 2025
IsPeerReviewed false
IsScholarly true
Issue 9
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c324t-45c38c2d78e909bdb1e1fea738860acb205d2bb4d9f962349c40c26547cbcea03
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ORCID 0000-0003-3168-5770
0000-0001-8687-4427
0000-0002-7092-0596
PMID 37030786
PQID 2798714381
PQPubID 23479
PageCount 12
ParticipantIDs crossref_primary_10_1109_TNNLS_2023_3252172
proquest_miscellaneous_2798714381
crossref_citationtrail_10_1109_TNNLS_2023_3252172
pubmed_primary_37030786
ieee_primary_10081331
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024-09-01
PublicationDateYYYYMMDD 2024-09-01
PublicationDate_xml – month: 09
  year: 2024
  text: 2024-09-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
PublicationTitle IEEE transaction on neural networks and learning systems
PublicationTitleAbbrev TNNLS
PublicationTitleAlternate IEEE Trans Neural Netw Learn Syst
PublicationYear 2024
Publisher IEEE
Publisher_xml – name: IEEE
References ref35
ref12
ref34
ref15
ref37
ref14
ref36
ref31
ref30
ref11
ref33
ref10
ref32
ref2
ref1
ref17
ref39
Gehring (ref13)
ref16
ref38
ref19
ref24
ref46
ref23
ref45
ref26
ref25
ref20
ref42
ref41
ref22
ref44
ref21
ref43
ref28
ref27
ref29
ref8
ref7
ref9
ref4
ref3
ref6
ref5
Kay (ref18) 2017
ref40
References_xml – ident: ref16
  doi: 10.1109/ICPR56361.2022.9956300
– ident: ref10
  doi: 10.1109/CVPR52688.2022.00298
– ident: ref35
  doi: 10.1609/aaai.v34i01.5438
– ident: ref28
  doi: 10.1109/tnnls.2022.3201518
– ident: ref45
  doi: 10.1109/TNNLS.2019.2935173
– ident: ref37
  doi: 10.1109/CVPR.2018.00675
– ident: ref46
  doi: 10.1109/CVPR52688.2022.01952
– ident: ref20
  doi: 10.1109/CVPR.2019.00371
– ident: ref4
  doi: 10.1109/ICCV48922.2021.01311
– ident: ref7
  doi: 10.1109/TIP.2021.3104182
– ident: ref40
  doi: 10.1109/CVPR.2018.00155
– ident: ref6
  doi: 10.1109/CVPR42600.2020.00026
– year: 2017
  ident: ref18
  article-title: The kinetics human action video dataset
  publication-title: arXiv:1705.06950
– ident: ref33
  doi: 10.1109/ICCV48922.2021.01316
– ident: ref42
  doi: 10.1609/aaai.v32i1.12328
– ident: ref41
  doi: 10.1109/TITS.2021.3135251
– ident: ref21
  doi: 10.1109/CVPR42600.2020.00029
– ident: ref29
  doi: 10.1109/LRA.2021.3139369
– ident: ref34
  doi: 10.1007/978-3-030-01246-5_7
– ident: ref43
  doi: 10.1109/CVPR42600.2020.00119
– ident: ref9
  doi: 10.1109/ICCV48922.2021.01127
– ident: ref32
  doi: 10.1109/CVPR.2019.01230
– ident: ref31
  doi: 10.1109/CVPR.2019.00810
– ident: ref11
  doi: 10.1145/3343031.3351170
– ident: ref25
  doi: 10.1109/TPAMI.2019.2916873
– ident: ref17
  doi: 10.1109/TPAMI.2012.59
– ident: ref2
  doi: 10.1109/CVPR.2017.143
– ident: ref39
  doi: 10.1109/CVPR.2014.82
– ident: ref14
  doi: 10.1609/aaai.v33i01.3301922
– ident: ref24
  doi: 10.1109/CVPR42600.2020.00187
– ident: ref27
  doi: 10.1609/aaai.v34i03.5652
– ident: ref15
  doi: 10.1109/CVPR52688.2022.00300
– ident: ref30
  doi: 10.1109/CVPR.2016.115
– ident: ref38
  doi: 10.48550/ARXIV.1706.03762
– ident: ref22
  doi: 10.1109/TIP.2021.3108708
– ident: ref23
  doi: 10.1109/ICCV48922.2021.01341
– ident: ref26
  doi: 10.1109/CVPR42600.2020.00022
– ident: ref1
  doi: 10.24963/ijcai.2019/274
– ident: ref36
  doi: 10.1109/CVPR52688.2022.01933
– ident: ref5
  doi: 10.1145/3369318.3369325
– ident: ref12
  doi: 10.1109/TMM.2021.3127040
– start-page: 1243
  volume-title: Proc. 34th Int. Conf. Mach. Learn.
  ident: ref13
  article-title: Convolutional sequence to sequence learning
– ident: ref3
  doi: 10.1109/CVPR.2017.502
– ident: ref44
  doi: 10.1109/CVPR42600.2020.01434
– ident: ref8
  doi: 10.1109/CVPR52688.2022.01955
– ident: ref19
  doi: 10.1109/TNNLS.2021.3061115
SSID ssj0000605649
Score 2.494722
Snippet Graph convolution networks (GCNs) have been widely used and achieved fruitful progress in the skeleton-based action recognition task. In GCNs, node interaction...
SourceID proquest
pubmed
crossref
ieee
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 12130
SubjectTerms Context awareness
Convolutional neural networks
Feature extraction
Heterogeneous context learning
multiscale graph
Representation learning
Skeleton
skeleton-based action recognition
spatial–temporal feature representation
Spatiotemporal phenomena
Topology
Title Learning Heterogeneous Spatial-Temporal Context for Skeleton-Based Action Recognition
URI https://ieeexplore.ieee.org/document/10081331
https://www.ncbi.nlm.nih.gov/pubmed/37030786
https://www.proquest.com/docview/2798714381
Volume 35
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JTsMwELWgB9QLZSlQNhmJG0rq2Nl8LIiqQpADbaXeIm_h0CpBkEiIr8d2kqpCKuKWg7N5xpo39sx7ANxKRIgUJHRobCXMiO9QxJATkMwPeYSUZ7tcX5JwMvefFsGiaVa3vTBKKVt8plxzac_yZSEqs1U2NEQ0OqfSyc6uztzqZq31hgrSwDy0cBd7IXYwiRZtkwyiw1mSPE9doxXuEhwYVaYu2COWDcu0UW_EJCuysh1v2rgz7oGk_eK63GTpViV3xfcvMsd__9IB2G8QKBzVLnMIdlR-BHqtugNsFvsxmDfUq29wYkpmCu1pqqg-oREx1k7rzGpSqxW0BFdfJdTwF06XOowZVeJ7HR0lHNmuCfjaVikVeR_Mx4-zh4nTiDA4QmOt0vEDQWKBZRQriiiX3FNeplhE4jhETHCMAok59yXNqIZSPhU-EthIGgsuFEPkBHTyIldnACrOmEF8GTOkZoF2iYB5QRZiKhnhEg2A15ohFQ1DuRHKWKU2U0E0tVZMjRXTxooDcLe-573m5_hzdN-YYGNkPfsDcNOaO9XLy5yZMDunKY5obCXi9ZjT2g_Wd7fuc77lqRegq1_u1xVpl6BTflTqSkOYkl9b1_0B1T_o1A
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1Nb9QwEB21RaK90AIFttBiJDihpI4dJ_GBQ4FWW7rdA92V9pb6KxxabRDNCuh_4a_w2xg7yapCKrdK3HKwrcQztt_EM-8BvLaUc2t4FskiSJjxNJJU0UjwKs10Tl0SqlxPx9lwmn6aidkK_FrWwjjnQvKZi_1juMu3tVn4X2X7nogGY6qky6E8cT-_Y4R29e74I5rzDWNHh5MPw6gTEYgMYoUmSoXhhWE2L5ykUluduKRyKudFkVFlNKPCMq1TKyuJUCCVJqWGeUleo41TlOO4q3APgYZgbXnY8hcOxVAgCwCbJRmLGM9nfVkOlfuT8Xh0Fnt18pgz4XWgNuA-D_xbvnD7xikYZF1uR7jhpDvahN_9HLUJLhfxotGxuf6LPvK_ncQteNBhbHLQLoqHsOLmj2Cz168g3Xb2GKYduewXMvRJQTWuJVcvroiXacZlGU1a2q5LEii8fjQEAT45u8CD2usuv8fz35KDUBdCPvd5WPV8G6Z38nFPYG1ez90zIE4r5TFtpTxtm0CnFyoRVcakVVxbOoCkN3tpOg52LwVyWYZYjMoyeE3pvabsvGYAb5d9vrYMJP9sve1NfqNla-0BvOrdq8QNxN8KqTCnJcslBs2e6W0AT1u_W_bu3XXnllFfwvpwcjoqR8fjk-ewgS-Stvl3L2Ct-bZwuwjYGr0Xlg2B87t2sT9yh0cW
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Learning+Heterogeneous+Spatial-Temporal+Context+for+Skeleton-Based+Action+Recognition&rft.jtitle=IEEE+transaction+on+neural+networks+and+learning+systems&rft.au=Gao%2C+Xuehao&rft.au=Yang%2C+Yang&rft.au=Wu%2C+Yang&rft.au=Du%2C+Shaoyi&rft.date=2024-09-01&rft.eissn=2162-2388&rft.volume=35&rft.issue=9&rft.spage=12130&rft_id=info:doi/10.1109%2FTNNLS.2023.3252172&rft_id=info%3Apmid%2F37030786&rft.externalDocID=37030786
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2162-237X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2162-237X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2162-237X&client=summon