Two-Stream Proximity Graph Transformer for Skeletal Person-Person Interaction Recognition With Statistical Information

Recognizing person-person interactions is practically significant and this type of interactive recognition is applied in many fields, such as video understanding and video surveillance. Compared with RGB data, skeletal data can more accurately depict articulated human movements due to its detailed r...

Full description

Saved in:
Bibliographic Details
Published inIEEE access Vol. 12; pp. 193091 - 193100
Main Authors Li, Meng, Wu, Yaqi, Sun, Qiumei, Yang, Weifeng
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN2169-3536
2169-3536
DOI10.1109/ACCESS.2024.3516511

Cover

Loading…
Abstract Recognizing person-person interactions is practically significant and this type of interactive recognition is applied in many fields, such as video understanding and video surveillance. Compared with RGB data, skeletal data can more accurately depict articulated human movements due to its detailed recording of joint locations. With the recent success of Transformer in computer vision, numerous scholars have begun to apply Transformer to recognize person-person interaction. However, these Transformer-based models do not fully take into account the dynamic spatiotemporal relationship between interacting people, which remains a challenge. To handle this challenge, we propose a novel Transformer-based model called Two-Stream Proximity Graph Transformer (2s-PGT) to recognize skeletal person-person interaction. Specifically, we first design three types of proximity graphs based on skeletal data to encode the dynamic proximity relationship between interacting people, including frame-based, sample-based and type-based proximity graphs. Secondly, we embed proximity graphs into our Transformer-based model to jointly learn the relationship between interacting people from spatiotemporal and semantic perspectives. We thirdly investigate a two-stream framework to integrate the information of interactive joints and interactive bones together to improve the accuracy of interaction recognition. Experimental results on the three public datasets, the SBU dataset (99.07%), the NTU-RGB+D dataset (Cross-Subject (95.72%), Cross-View (97.87%)) and the NTU-RGB+D120 dataset (Cross-Subject (92.01%), Cross-View (91.65%)), demonstrate that our approach outperforms the state-of-the-art methods.
AbstractList Recognizing person-person interactions is practically significant and this type of interactive recognition is applied in many fields, such as video understanding and video surveillance. Compared with RGB data, skeletal data can more accurately depict articulated human movements due to its detailed recording of joint locations. With the recent success of Transformer in computer vision, numerous scholars have begun to apply Transformer to recognize person-person interaction. However, these Transformer-based models do not fully take into account the dynamic spatiotemporal relationship between interacting people, which remains a challenge. To handle this challenge, we propose a novel Transformer-based model called Two-Stream Proximity Graph Transformer (2s-PGT) to recognize skeletal person-person interaction. Specifically, we first design three types of proximity graphs based on skeletal data to encode the dynamic proximity relationship between interacting people, including frame-based, sample-based and type-based proximity graphs. Secondly, we embed proximity graphs into our Transformer-based model to jointly learn the relationship between interacting people from spatiotemporal and semantic perspectives. We thirdly investigate a two-stream framework to integrate the information of interactive joints and interactive bones together to improve the accuracy of interaction recognition. Experimental results on the three public datasets, the SBU dataset (99.07%), the NTU-RGB+D dataset (Cross-Subject (95.72%), Cross-View (97.87%)) and the NTU-RGB+D120 dataset (Cross-Subject (92.01%), Cross-View (91.65%)), demonstrate that our approach outperforms the state-of-the-art methods.
Author Yang, Weifeng
Wu, Yaqi
Sun, Qiumei
Li, Meng
Author_xml – sequence: 1
  givenname: Meng
  orcidid: 0000-0003-3497-4391
  surname: Li
  fullname: Li, Meng
  email: mli269-c@my.cityu.edu.hk
  organization: College of Mathematics and Statistic, Hebei University of Economics and Business, Shijiazhuang, Hebei, China
– sequence: 2
  givenname: Yaqi
  orcidid: 0009-0009-1126-366X
  surname: Wu
  fullname: Wu, Yaqi
  organization: College of Mathematics and Statistic, Hebei University of Economics and Business, Shijiazhuang, Hebei, China
– sequence: 3
  givenname: Qiumei
  surname: Sun
  fullname: Sun, Qiumei
  organization: Yiban Development Center, Hebei University of Economics and Business, Shijiazhuang, Hebei, China
– sequence: 4
  givenname: Weifeng
  surname: Yang
  fullname: Yang, Weifeng
  organization: Vipshop, Shanghai, China
BookMark eNpNUV1P3DAQtBCVoFd-AX2I1Odc_e3kEZ0oPQmpqLmKR2sTb8DXu_hqm7b8-_oIqtiXGa13Zlee9-R0ChMScsnokjHafr5ara67bskpl0uhmFaMnZBzznRbCyX06Rt-Ri5S2tJSTWkpc05-b_6EussRYV_dxfDX731-rm4iHB6rTYQpjSHuMVYFqu4n7jDDrrrDmMJUz1Ctp4wRhuwL_45DeJj8C7_3-bHqMmSfsh-KbD0dzeD4-IG8G2GX8OIVF-THl-vN6mt9--1mvbq6rQeh2lyLnhnHFKWaIm8cFSj73iCYRjunwTlkXBtGjeOSoXAt51Q02GOvDHApxIKsZ18XYGsP0e8hPtsA3r40QnywEMt1O7SypwwaYEBNK50zvdPcjBKbfjSCKl68Ps1ehxh-PWHKdhue4lTOt4LJlrZSlz9eEDFPDTGkFHH8v5VRe8zLznnZY172Na-i-jirPCK-UZhWSS3FP3qNlIc
CODEN IAECCG
Cites_doi 10.1016/j.jvcir.2023.104020
10.1109/ICCV51070.2023.00958
10.1007/s11042-020-08806-9
10.3390/app12073281
10.1016/j.matpr.2020.09.052
10.1016/j.asoc.2021.107236
10.3390/app11052188
10.1007/978-3-031-19806-9_35
10.1016/j.patcog.2021.107920
10.1007/s11042-022-14075-5
10.1609/aaai.v32i1.12328
10.1016/j.image.2020.115803
10.3390/s23125718
10.1038/s41598-023-49739-1
10.1109/ACCESS.2021.3059650
10.1109/TPAMI.2019.2942030
10.1109/CVPR.2017.391
10.1109/ICMEW.2014.6890714
10.1088/1742-6596/2632/1/012011
10.1109/CVPR.2015.7298714
10.1109/CVPRW59228.2023.00548
10.1109/CVPRW.2012.6239234
10.1016/j.engappai.2024.107957
10.1145/3639470
10.3390/s21020452
10.1016/j.neucom.2023.126830
10.1007/978-3-030-41404-7_19
10.1016/j.neucom.2019.12.149
10.1109/CVPR.2014.82
10.3390/biomimetics9030123
10.1109/LRA.2021.3056361
10.1109/TIP.2021.3129117
10.1109/TCSVT.2022.3193574
10.1109/CVPR.2015.7299172
10.1016/j.birob.2022.100062
10.1007/978-3-319-46487-9_50
10.1109/FIT.2018.00045
10.1109/ICPR48806.2021.9412091
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024
DBID 97E
ESBDL
RIA
RIE
AAYXX
CITATION
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
DOA
DOI 10.1109/ACCESS.2024.3516511
DatabaseName IEEE Xplore (IEEE)
IEEE Xplore Open Access Journals
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Engineered Materials Abstracts
METADEX
Technology Research Database
Materials Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
Materials Research Database
Engineered Materials Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
METADEX
Computer and Information Systems Abstracts Professional
DatabaseTitleList Materials Research Database


Database_xml – sequence: 1
  dbid: DOA
  name: Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2169-3536
EndPage 193100
ExternalDocumentID oai_doaj_org_article_4b01a8a1a0794dd7bd627f4e8bf73052
10_1109_ACCESS_2024_3516511
10795464
Genre orig-research
GrantInformation_xml – fundername: Hebei University of Economics and Business
  grantid: 2024ZD10
  funderid: 10.13039/100008314
– fundername: Science Research Project of Hebei Education Department
  grantid: ZD2021319
GroupedDBID 0R~
4.4
5VS
6IK
97E
AAJGR
ABAZT
ABVLG
ACGFS
ADBBV
AGSQL
ALMA_UNASSIGNED_HOLDINGS
BCNDV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
EBS
EJD
ESBDL
GROUPED_DOAJ
IPLJI
JAVBF
KQ8
M43
M~E
O9-
OCL
OK1
RIA
RIE
RNS
AAYXX
CITATION
RIG
7SC
7SP
7SR
8BQ
8FD
JG9
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c359t-3b17d150060e28d03e4bb7ea786dd6adde1267107d241e3d922038ebeb57a2433
IEDL.DBID DOA
ISSN 2169-3536
IngestDate Wed Aug 27 01:28:03 EDT 2025
Mon Jun 30 12:38:30 EDT 2025
Tue Jul 01 03:03:01 EDT 2025
Wed Aug 27 02:28:29 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
License https://creativecommons.org/licenses/by/4.0/legalcode
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c359t-3b17d150060e28d03e4bb7ea786dd6adde1267107d241e3d922038ebeb57a2433
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0003-3497-4391
0009-0009-1126-366X
OpenAccessLink https://doaj.org/article/4b01a8a1a0794dd7bd627f4e8bf73052
PQID 3149094635
PQPubID 4845423
PageCount 10
ParticipantIDs proquest_journals_3149094635
crossref_primary_10_1109_ACCESS_2024_3516511
ieee_primary_10795464
doaj_primary_oai_doaj_org_article_4b01a8a1a0794dd7bd627f4e8bf73052
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 20240000
2024-00-00
20240101
2024-01-01
PublicationDateYYYYMMDD 2024-01-01
PublicationDate_xml – year: 2024
  text: 20240000
PublicationDecade 2020
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE access
PublicationTitleAbbrev Access
PublicationYear 2024
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
ref35
ref12
ref15
ref37
ref14
ref36
ref31
ref30
ref11
ref33
ref10
ref32
ref1
ref17
ref39
ref38
ref19
ref18
Dosovitskiy (ref34) 2020
Khan (ref2) 2020
Hussein (ref21)
ref24
ref23
ref26
ref25
ref20
ref42
ref41
ref22
ref43
ref28
ref27
ref29
ref8
ref7
Choi (ref4) 2008; 7
ref9
ref3
ref6
ref5
ref40
Yin (ref16) 2023
References_xml – ident: ref27
  doi: 10.1016/j.jvcir.2023.104020
– ident: ref12
  doi: 10.1109/ICCV51070.2023.00958
– ident: ref3
  doi: 10.1007/s11042-020-08806-9
– ident: ref7
  doi: 10.3390/app12073281
– ident: ref24
  doi: 10.1016/j.matpr.2020.09.052
– ident: ref25
  doi: 10.1016/j.asoc.2021.107236
– ident: ref8
  doi: 10.3390/app11052188
– ident: ref42
  doi: 10.1007/978-3-031-19806-9_35
– ident: ref37
  doi: 10.1016/j.patcog.2021.107920
– ident: ref6
  doi: 10.1007/s11042-022-14075-5
– ident: ref38
  doi: 10.1609/aaai.v32i1.12328
– volume: 7
  start-page: 71
  issue: 3
  year: 2008
  ident: ref4
  article-title: A view-based multiple objects tracking and human action recognition for interactive virtual environments
  publication-title: Int. J. Virtual Real.
– ident: ref5
  doi: 10.1016/j.image.2020.115803
– ident: ref1
  doi: 10.3390/s23125718
– ident: ref9
  doi: 10.1038/s41598-023-49739-1
– ident: ref23
  doi: 10.1109/ACCESS.2021.3059650
– ident: ref33
  doi: 10.1109/TPAMI.2019.2942030
– ident: ref41
  doi: 10.1109/CVPR.2017.391
– ident: ref31
  doi: 10.1109/ICMEW.2014.6890714
– ident: ref28
  doi: 10.1088/1742-6596/2632/1/012011
– start-page: 2466
  volume-title: Proc. 23rd Int. Joint Conf. Artif. Intell.
  ident: ref21
  article-title: Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations
– start-page: 3287
  year: 2020
  ident: ref2
  article-title: Multimedia; study findings from foundation University Islamabad provide new insights into multimedia (human action recognition using fusion of multiview and deep features: An application to video surveillance)
  publication-title: J. Technol.
– ident: ref22
  doi: 10.1109/CVPR.2015.7298714
– ident: ref18
  doi: 10.1109/CVPRW59228.2023.00548
– ident: ref30
  doi: 10.1109/CVPRW.2012.6239234
– ident: ref26
  doi: 10.1016/j.engappai.2024.107957
– ident: ref36
  doi: 10.1145/3639470
– ident: ref15
  doi: 10.3390/s21020452
– ident: ref29
  doi: 10.1016/j.neucom.2023.126830
– ident: ref43
  doi: 10.1007/978-3-030-41404-7_19
– ident: ref39
  doi: 10.1016/j.neucom.2019.12.149
– ident: ref20
  doi: 10.1109/CVPR.2014.82
– ident: ref35
  doi: 10.3390/biomimetics9030123
– ident: ref13
  doi: 10.1109/LRA.2021.3056361
– ident: ref14
  doi: 10.1109/TIP.2021.3129117
– ident: ref17
  doi: 10.1109/TCSVT.2022.3193574
– ident: ref19
  doi: 10.1109/CVPR.2015.7299172
– year: 2023
  ident: ref16
  article-title: A two-stream hybrid CNN-transformer network for skeleton-based human interaction recognition
  publication-title: arXiv:2401.00409
– ident: ref10
  doi: 10.1016/j.birob.2022.100062
– ident: ref40
  doi: 10.1007/978-3-319-46487-9_50
– year: 2020
  ident: ref34
  article-title: An image is worth 16×16 words: Transformers for image recognition at scale
  publication-title: arXiv:2010.11929
– ident: ref32
  doi: 10.1109/FIT.2018.00045
– ident: ref11
  doi: 10.1109/ICPR48806.2021.9412091
SSID ssj0000816957
Score 2.2997413
Snippet Recognizing person-person interactions is practically significant and this type of interactive recognition is applied in many fields, such as video...
SourceID doaj
proquest
crossref
ieee
SourceType Open Website
Aggregation Database
Index Database
Publisher
StartPage 193091
SubjectTerms Accuracy
Bones
Computer vision
Data mining
Data models
Datasets
Feature extraction
Graphs
Human activity recognition
Human motion
Interaction recognition
Joints
Proximity
proximity graphs
Recognition
Semantics
Spatiotemporal data
Spatiotemporal phenomena
transformer
Transformers
two-stream networks
Video data
SummonAdditionalLinks – databaseName: IEEE Electronic Library (IEL)
  dbid: RIE
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB7RnuDAs4iFgnzgSJbEz-RYVpQKiaqCrejN8mNWoKq7qGRB8OsZP1IVEBKnWEksO_nG9je25zPA88CDIrOIjUGyYBlc2zgddCM1SvQp2tGnFd13x_roVL49U2c1WD3HwiBi3nyG85TMa_lxE7ZpqoxauBmU1HIHdshzK8FaVxMq6QSJQZmqLNS1w8uDxYI-gnxALudCdVp13W-jTxbpr6eq_NUV5_Hl8A4cTzUr20rO59vRz8PPP0Qb_7vqd-F2ZZrsoJjGPbiB6_tw65r-4AP4tvy-adK6tLtgJ1RwCnb6wd4kDWu2nBgtXjK6sA_nNEARU2cnmaM35cLyjGIJjmDvp81IlP74efzEEpPNQtCUrcY9pYd7cHr4erk4aupBDE0Qahgb4TsTiTm2ukXex1ag9N6gM72OUacesuOaqIqJxAdQxIHzVvRkHl4Zx6UQD2F3vVnjI2DknqFwQQrTDzIO2pFzvuKy986FlXRuBi8mgOyXordhs5_SDrbgaROetuI5g1cJxKtXk1h2vkE_39a2Z6VvO9e7zhEGMkbjo-ZmJbH3K-rfFJ_BXgLsWnkFqxnsTzZha8v-agW5lOQSE097_I9sT-BmqmKZp9mH3fFyi0-JuYz-WbbYX1uw7Pg
  priority: 102
  providerName: IEEE
Title Two-Stream Proximity Graph Transformer for Skeletal Person-Person Interaction Recognition With Statistical Information
URI https://ieeexplore.ieee.org/document/10795464
https://www.proquest.com/docview/3149094635
https://doaj.org/article/4b01a8a1a0794dd7bd627f4e8bf73052
Volume 12
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEA7Skx7EJ1ar5ODRtXln96jFWgSlaMXeQrJJUcRWan39eyfZrVQ8ePGUZR_sZmZ25psk8wWhw5KVEszCZzqABYvSksyqUmVCBRFcrHZ0cUb38kr1bsXFUA4XtvqKa8IqeuBKcG3hCLW5pZaA5XivnVdMj0TI3QiMUybvCzFvIZlKPjinqpC6phmipGifdDrQI0gImTjmkipJ6Y9QlBj76y1WfvnlFGy6a2i1Ron4pPq6dbQUxhtoZYE7cBO9Dd4nWZxTtk-4P518xEKlT3we-afxYI5GwxRDg28eIbgAysb9hK-zqsFpNLAqbMDX84VEcHz3MLvHEYUmEmd4rK5Zihe30G33bNDpZfUmClnJZTHLuKPaA-ojigSWe8KDcE4Hq3PlvYrejTIFMEN7iOWB-4IxwnNQrZPaMsH5NmqMJ-OwgzCkVoHbUnCdgyoKZSGxHjGRO2vLkbC2iY7m8jTPFVeGSTkGKUwlfhPFb2rxN9FplPn3rZHoOp0A9Zta_eYv9TfRVtTYwvt0IYUSTdSaq9DUf-WL4ZAOQjoLGGv3P969h5Zjf6oBmRZqzKavYR8gyswdJGs8SNWEX1KW4-U
linkProvider Directory of Open Access Journals
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB5BOUAPlEdRFwr4wJEsiZ_JsV1RFmhXFWxFb5ZfK1DFLipZUPvrGT9SFRASp1hJLDv5xvY3tr8xwAtHnUCz8JUKaMHcmboy0smKy8CDjWpHG1d0j2ZyesLfnYrTIlZPWpgQQtp8FsYxmdby_cqt41QZtnDVCS75Tbgloho3y7WuplTiGRKdUCW2UFN3r_YmE_wM9AIpHzPRSNE0v40_KUx_OVflr844jTAHWzAb6pY3lpyN170du8s_wjb-d-Xvwd3CNcleNo77cCMsH8DmtQiED-HH_OeqiivT5is5xoKj3OmCvIlRrMl84LThnOCFfDzDIQq5OjlOLL3KF5LmFLM8gnwYtiNh-tOX_jOJXDaFgsZsRfkUH27DycHr-WRalaMYKsdE11fMNsojd6xlHWjraxa4tSoY1UrvZewjGyqRrCiPjCAw31FasxYNxAplKGfsEWwsV8uwAwQdtMCM40y1HfedNOieLyhvrTFuwY0ZwcsBIP0tR9zQyVOpO53x1BFPXfAcwX4E8erVGC473cCfr0vr09zWjWlNYxAD7r2yXlK14KG1C-zhBB3BdgTsWnkZqxHsDjahS9v-rhk6legUI1N7_I9sz-H2dH50qA_fzt4_gTuxunnWZhc2-vN1eIo8prfPkvX-AtdL8EA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Two-Stream+Proximity+Graph+Transformer+for+Skeletal+Person-Person+Interaction+Recognition+With+Statistical+Information&rft.jtitle=IEEE+access&rft.au=Li%2C+Meng&rft.au=Wu%2C+Yaqi&rft.au=Sun%2C+Qiumei&rft.au=Yang%2C+Weifeng&rft.date=2024&rft.issn=2169-3536&rft.eissn=2169-3536&rft.volume=12&rft.spage=193091&rft.epage=193100&rft_id=info:doi/10.1109%2FACCESS.2024.3516511&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_ACCESS_2024_3516511
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon