Two-Stream Proximity Graph Transformer for Skeletal Person-Person Interaction Recognition With Statistical Information
Recognizing person-person interactions is practically significant and this type of interactive recognition is applied in many fields, such as video understanding and video surveillance. Compared with RGB data, skeletal data can more accurately depict articulated human movements due to its detailed r...
Saved in:
Published in | IEEE access Vol. 12; pp. 193091 - 193100 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
ISSN | 2169-3536 2169-3536 |
DOI | 10.1109/ACCESS.2024.3516511 |
Cover
Loading…
Abstract | Recognizing person-person interactions is practically significant and this type of interactive recognition is applied in many fields, such as video understanding and video surveillance. Compared with RGB data, skeletal data can more accurately depict articulated human movements due to its detailed recording of joint locations. With the recent success of Transformer in computer vision, numerous scholars have begun to apply Transformer to recognize person-person interaction. However, these Transformer-based models do not fully take into account the dynamic spatiotemporal relationship between interacting people, which remains a challenge. To handle this challenge, we propose a novel Transformer-based model called Two-Stream Proximity Graph Transformer (2s-PGT) to recognize skeletal person-person interaction. Specifically, we first design three types of proximity graphs based on skeletal data to encode the dynamic proximity relationship between interacting people, including frame-based, sample-based and type-based proximity graphs. Secondly, we embed proximity graphs into our Transformer-based model to jointly learn the relationship between interacting people from spatiotemporal and semantic perspectives. We thirdly investigate a two-stream framework to integrate the information of interactive joints and interactive bones together to improve the accuracy of interaction recognition. Experimental results on the three public datasets, the SBU dataset (99.07%), the NTU-RGB+D dataset (Cross-Subject (95.72%), Cross-View (97.87%)) and the NTU-RGB+D120 dataset (Cross-Subject (92.01%), Cross-View (91.65%)), demonstrate that our approach outperforms the state-of-the-art methods. |
---|---|
AbstractList | Recognizing person-person interactions is practically significant and this type of interactive recognition is applied in many fields, such as video understanding and video surveillance. Compared with RGB data, skeletal data can more accurately depict articulated human movements due to its detailed recording of joint locations. With the recent success of Transformer in computer vision, numerous scholars have begun to apply Transformer to recognize person-person interaction. However, these Transformer-based models do not fully take into account the dynamic spatiotemporal relationship between interacting people, which remains a challenge. To handle this challenge, we propose a novel Transformer-based model called Two-Stream Proximity Graph Transformer (2s-PGT) to recognize skeletal person-person interaction. Specifically, we first design three types of proximity graphs based on skeletal data to encode the dynamic proximity relationship between interacting people, including frame-based, sample-based and type-based proximity graphs. Secondly, we embed proximity graphs into our Transformer-based model to jointly learn the relationship between interacting people from spatiotemporal and semantic perspectives. We thirdly investigate a two-stream framework to integrate the information of interactive joints and interactive bones together to improve the accuracy of interaction recognition. Experimental results on the three public datasets, the SBU dataset (99.07%), the NTU-RGB+D dataset (Cross-Subject (95.72%), Cross-View (97.87%)) and the NTU-RGB+D120 dataset (Cross-Subject (92.01%), Cross-View (91.65%)), demonstrate that our approach outperforms the state-of-the-art methods. |
Author | Yang, Weifeng Wu, Yaqi Sun, Qiumei Li, Meng |
Author_xml | – sequence: 1 givenname: Meng orcidid: 0000-0003-3497-4391 surname: Li fullname: Li, Meng email: mli269-c@my.cityu.edu.hk organization: College of Mathematics and Statistic, Hebei University of Economics and Business, Shijiazhuang, Hebei, China – sequence: 2 givenname: Yaqi orcidid: 0009-0009-1126-366X surname: Wu fullname: Wu, Yaqi organization: College of Mathematics and Statistic, Hebei University of Economics and Business, Shijiazhuang, Hebei, China – sequence: 3 givenname: Qiumei surname: Sun fullname: Sun, Qiumei organization: Yiban Development Center, Hebei University of Economics and Business, Shijiazhuang, Hebei, China – sequence: 4 givenname: Weifeng surname: Yang fullname: Yang, Weifeng organization: Vipshop, Shanghai, China |
BookMark | eNpNUV1P3DAQtBCVoFd-AX2I1Odc_e3kEZ0oPQmpqLmKR2sTb8DXu_hqm7b8-_oIqtiXGa13Zlee9-R0ChMScsnokjHafr5ara67bskpl0uhmFaMnZBzznRbCyX06Rt-Ri5S2tJSTWkpc05-b_6EussRYV_dxfDX731-rm4iHB6rTYQpjSHuMVYFqu4n7jDDrrrDmMJUz1Ctp4wRhuwL_45DeJj8C7_3-bHqMmSfsh-KbD0dzeD4-IG8G2GX8OIVF-THl-vN6mt9--1mvbq6rQeh2lyLnhnHFKWaIm8cFSj73iCYRjunwTlkXBtGjeOSoXAt51Q02GOvDHApxIKsZ18XYGsP0e8hPtsA3r40QnywEMt1O7SypwwaYEBNK50zvdPcjBKbfjSCKl68Ps1ehxh-PWHKdhue4lTOt4LJlrZSlz9eEDFPDTGkFHH8v5VRe8zLznnZY172Na-i-jirPCK-UZhWSS3FP3qNlIc |
CODEN | IAECCG |
Cites_doi | 10.1016/j.jvcir.2023.104020 10.1109/ICCV51070.2023.00958 10.1007/s11042-020-08806-9 10.3390/app12073281 10.1016/j.matpr.2020.09.052 10.1016/j.asoc.2021.107236 10.3390/app11052188 10.1007/978-3-031-19806-9_35 10.1016/j.patcog.2021.107920 10.1007/s11042-022-14075-5 10.1609/aaai.v32i1.12328 10.1016/j.image.2020.115803 10.3390/s23125718 10.1038/s41598-023-49739-1 10.1109/ACCESS.2021.3059650 10.1109/TPAMI.2019.2942030 10.1109/CVPR.2017.391 10.1109/ICMEW.2014.6890714 10.1088/1742-6596/2632/1/012011 10.1109/CVPR.2015.7298714 10.1109/CVPRW59228.2023.00548 10.1109/CVPRW.2012.6239234 10.1016/j.engappai.2024.107957 10.1145/3639470 10.3390/s21020452 10.1016/j.neucom.2023.126830 10.1007/978-3-030-41404-7_19 10.1016/j.neucom.2019.12.149 10.1109/CVPR.2014.82 10.3390/biomimetics9030123 10.1109/LRA.2021.3056361 10.1109/TIP.2021.3129117 10.1109/TCSVT.2022.3193574 10.1109/CVPR.2015.7299172 10.1016/j.birob.2022.100062 10.1007/978-3-319-46487-9_50 10.1109/FIT.2018.00045 10.1109/ICPR48806.2021.9412091 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
DBID | 97E ESBDL RIA RIE AAYXX CITATION 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D DOA |
DOI | 10.1109/ACCESS.2024.3516511 |
DatabaseName | IEEE Xplore (IEEE) IEEE Xplore Open Access Journals IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Engineered Materials Abstracts METADEX Technology Research Database Materials Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef Materials Research Database Engineered Materials Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace METADEX Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Materials Research Database |
Database_xml | – sequence: 1 dbid: DOA name: Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 2169-3536 |
EndPage | 193100 |
ExternalDocumentID | oai_doaj_org_article_4b01a8a1a0794dd7bd627f4e8bf73052 10_1109_ACCESS_2024_3516511 10795464 |
Genre | orig-research |
GrantInformation_xml | – fundername: Hebei University of Economics and Business grantid: 2024ZD10 funderid: 10.13039/100008314 – fundername: Science Research Project of Hebei Education Department grantid: ZD2021319 |
GroupedDBID | 0R~ 4.4 5VS 6IK 97E AAJGR ABAZT ABVLG ACGFS ADBBV AGSQL ALMA_UNASSIGNED_HOLDINGS BCNDV BEFXN BFFAM BGNUA BKEBE BPEOZ EBS EJD ESBDL GROUPED_DOAJ IPLJI JAVBF KQ8 M43 M~E O9- OCL OK1 RIA RIE RNS AAYXX CITATION RIG 7SC 7SP 7SR 8BQ 8FD JG9 JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c359t-3b17d150060e28d03e4bb7ea786dd6adde1267107d241e3d922038ebeb57a2433 |
IEDL.DBID | DOA |
ISSN | 2169-3536 |
IngestDate | Wed Aug 27 01:28:03 EDT 2025 Mon Jun 30 12:38:30 EDT 2025 Tue Jul 01 03:03:01 EDT 2025 Wed Aug 27 02:28:29 EDT 2025 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
License | https://creativecommons.org/licenses/by/4.0/legalcode |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c359t-3b17d150060e28d03e4bb7ea786dd6adde1267107d241e3d922038ebeb57a2433 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0003-3497-4391 0009-0009-1126-366X |
OpenAccessLink | https://doaj.org/article/4b01a8a1a0794dd7bd627f4e8bf73052 |
PQID | 3149094635 |
PQPubID | 4845423 |
PageCount | 10 |
ParticipantIDs | proquest_journals_3149094635 crossref_primary_10_1109_ACCESS_2024_3516511 ieee_primary_10795464 doaj_primary_oai_doaj_org_article_4b01a8a1a0794dd7bd627f4e8bf73052 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 20240000 2024-00-00 20240101 2024-01-01 |
PublicationDateYYYYMMDD | 2024-01-01 |
PublicationDate_xml | – year: 2024 text: 20240000 |
PublicationDecade | 2020 |
PublicationPlace | Piscataway |
PublicationPlace_xml | – name: Piscataway |
PublicationTitle | IEEE access |
PublicationTitleAbbrev | Access |
PublicationYear | 2024 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 ref35 ref12 ref15 ref37 ref14 ref36 ref31 ref30 ref11 ref33 ref10 ref32 ref1 ref17 ref39 ref38 ref19 ref18 Dosovitskiy (ref34) 2020 Khan (ref2) 2020 Hussein (ref21) ref24 ref23 ref26 ref25 ref20 ref42 ref41 ref22 ref43 ref28 ref27 ref29 ref8 ref7 Choi (ref4) 2008; 7 ref9 ref3 ref6 ref5 ref40 Yin (ref16) 2023 |
References_xml | – ident: ref27 doi: 10.1016/j.jvcir.2023.104020 – ident: ref12 doi: 10.1109/ICCV51070.2023.00958 – ident: ref3 doi: 10.1007/s11042-020-08806-9 – ident: ref7 doi: 10.3390/app12073281 – ident: ref24 doi: 10.1016/j.matpr.2020.09.052 – ident: ref25 doi: 10.1016/j.asoc.2021.107236 – ident: ref8 doi: 10.3390/app11052188 – ident: ref42 doi: 10.1007/978-3-031-19806-9_35 – ident: ref37 doi: 10.1016/j.patcog.2021.107920 – ident: ref6 doi: 10.1007/s11042-022-14075-5 – ident: ref38 doi: 10.1609/aaai.v32i1.12328 – volume: 7 start-page: 71 issue: 3 year: 2008 ident: ref4 article-title: A view-based multiple objects tracking and human action recognition for interactive virtual environments publication-title: Int. J. Virtual Real. – ident: ref5 doi: 10.1016/j.image.2020.115803 – ident: ref1 doi: 10.3390/s23125718 – ident: ref9 doi: 10.1038/s41598-023-49739-1 – ident: ref23 doi: 10.1109/ACCESS.2021.3059650 – ident: ref33 doi: 10.1109/TPAMI.2019.2942030 – ident: ref41 doi: 10.1109/CVPR.2017.391 – ident: ref31 doi: 10.1109/ICMEW.2014.6890714 – ident: ref28 doi: 10.1088/1742-6596/2632/1/012011 – start-page: 2466 volume-title: Proc. 23rd Int. Joint Conf. Artif. Intell. ident: ref21 article-title: Human action recognition using a temporal hierarchy of covariance descriptors on 3D joint locations – start-page: 3287 year: 2020 ident: ref2 article-title: Multimedia; study findings from foundation University Islamabad provide new insights into multimedia (human action recognition using fusion of multiview and deep features: An application to video surveillance) publication-title: J. Technol. – ident: ref22 doi: 10.1109/CVPR.2015.7298714 – ident: ref18 doi: 10.1109/CVPRW59228.2023.00548 – ident: ref30 doi: 10.1109/CVPRW.2012.6239234 – ident: ref26 doi: 10.1016/j.engappai.2024.107957 – ident: ref36 doi: 10.1145/3639470 – ident: ref15 doi: 10.3390/s21020452 – ident: ref29 doi: 10.1016/j.neucom.2023.126830 – ident: ref43 doi: 10.1007/978-3-030-41404-7_19 – ident: ref39 doi: 10.1016/j.neucom.2019.12.149 – ident: ref20 doi: 10.1109/CVPR.2014.82 – ident: ref35 doi: 10.3390/biomimetics9030123 – ident: ref13 doi: 10.1109/LRA.2021.3056361 – ident: ref14 doi: 10.1109/TIP.2021.3129117 – ident: ref17 doi: 10.1109/TCSVT.2022.3193574 – ident: ref19 doi: 10.1109/CVPR.2015.7299172 – year: 2023 ident: ref16 article-title: A two-stream hybrid CNN-transformer network for skeleton-based human interaction recognition publication-title: arXiv:2401.00409 – ident: ref10 doi: 10.1016/j.birob.2022.100062 – ident: ref40 doi: 10.1007/978-3-319-46487-9_50 – year: 2020 ident: ref34 article-title: An image is worth 16×16 words: Transformers for image recognition at scale publication-title: arXiv:2010.11929 – ident: ref32 doi: 10.1109/FIT.2018.00045 – ident: ref11 doi: 10.1109/ICPR48806.2021.9412091 |
SSID | ssj0000816957 |
Score | 2.2997413 |
Snippet | Recognizing person-person interactions is practically significant and this type of interactive recognition is applied in many fields, such as video... |
SourceID | doaj proquest crossref ieee |
SourceType | Open Website Aggregation Database Index Database Publisher |
StartPage | 193091 |
SubjectTerms | Accuracy Bones Computer vision Data mining Data models Datasets Feature extraction Graphs Human activity recognition Human motion Interaction recognition Joints Proximity proximity graphs Recognition Semantics Spatiotemporal data Spatiotemporal phenomena transformer Transformers two-stream networks Video data |
SummonAdditionalLinks | – databaseName: IEEE Electronic Library (IEL) dbid: RIE link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB7RnuDAs4iFgnzgSJbEz-RYVpQKiaqCrejN8mNWoKq7qGRB8OsZP1IVEBKnWEksO_nG9je25zPA88CDIrOIjUGyYBlc2zgddCM1SvQp2tGnFd13x_roVL49U2c1WD3HwiBi3nyG85TMa_lxE7ZpqoxauBmU1HIHdshzK8FaVxMq6QSJQZmqLNS1w8uDxYI-gnxALudCdVp13W-jTxbpr6eq_NUV5_Hl8A4cTzUr20rO59vRz8PPP0Qb_7vqd-F2ZZrsoJjGPbiB6_tw65r-4AP4tvy-adK6tLtgJ1RwCnb6wd4kDWu2nBgtXjK6sA_nNEARU2cnmaM35cLyjGIJjmDvp81IlP74efzEEpPNQtCUrcY9pYd7cHr4erk4aupBDE0Qahgb4TsTiTm2ukXex1ag9N6gM72OUacesuOaqIqJxAdQxIHzVvRkHl4Zx6UQD2F3vVnjI2DknqFwQQrTDzIO2pFzvuKy986FlXRuBi8mgOyXordhs5_SDrbgaROetuI5g1cJxKtXk1h2vkE_39a2Z6VvO9e7zhEGMkbjo-ZmJbH3K-rfFJ_BXgLsWnkFqxnsTzZha8v-agW5lOQSE097_I9sT-BmqmKZp9mH3fFyi0-JuYz-WbbYX1uw7Pg priority: 102 providerName: IEEE |
Title | Two-Stream Proximity Graph Transformer for Skeletal Person-Person Interaction Recognition With Statistical Information |
URI | https://ieeexplore.ieee.org/document/10795464 https://www.proquest.com/docview/3149094635 https://doaj.org/article/4b01a8a1a0794dd7bd627f4e8bf73052 |
Volume | 12 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEA7Skx7EJ1ar5ODRtXln96jFWgSlaMXeQrJJUcRWan39eyfZrVQ8ePGUZR_sZmZ25psk8wWhw5KVEszCZzqABYvSksyqUmVCBRFcrHZ0cUb38kr1bsXFUA4XtvqKa8IqeuBKcG3hCLW5pZaA5XivnVdMj0TI3QiMUybvCzFvIZlKPjinqpC6phmipGifdDrQI0gImTjmkipJ6Y9QlBj76y1WfvnlFGy6a2i1Ron4pPq6dbQUxhtoZYE7cBO9Dd4nWZxTtk-4P518xEKlT3we-afxYI5GwxRDg28eIbgAysb9hK-zqsFpNLAqbMDX84VEcHz3MLvHEYUmEmd4rK5Zihe30G33bNDpZfUmClnJZTHLuKPaA-ojigSWe8KDcE4Hq3PlvYrejTIFMEN7iOWB-4IxwnNQrZPaMsH5NmqMJ-OwgzCkVoHbUnCdgyoKZSGxHjGRO2vLkbC2iY7m8jTPFVeGSTkGKUwlfhPFb2rxN9FplPn3rZHoOp0A9Zta_eYv9TfRVtTYwvt0IYUSTdSaq9DUf-WL4ZAOQjoLGGv3P969h5Zjf6oBmRZqzKavYR8gyswdJGs8SNWEX1KW4-U |
linkProvider | Directory of Open Access Journals |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB5BOUAPlEdRFwr4wJEsiZ_JsV1RFmhXFWxFb5ZfK1DFLipZUPvrGT9SFRASp1hJLDv5xvY3tr8xwAtHnUCz8JUKaMHcmboy0smKy8CDjWpHG1d0j2ZyesLfnYrTIlZPWpgQQtp8FsYxmdby_cqt41QZtnDVCS75Tbgloho3y7WuplTiGRKdUCW2UFN3r_YmE_wM9AIpHzPRSNE0v40_KUx_OVflr844jTAHWzAb6pY3lpyN170du8s_wjb-d-Xvwd3CNcleNo77cCMsH8DmtQiED-HH_OeqiivT5is5xoKj3OmCvIlRrMl84LThnOCFfDzDIQq5OjlOLL3KF5LmFLM8gnwYtiNh-tOX_jOJXDaFgsZsRfkUH27DycHr-WRalaMYKsdE11fMNsojd6xlHWjraxa4tSoY1UrvZewjGyqRrCiPjCAw31FasxYNxAplKGfsEWwsV8uwAwQdtMCM40y1HfedNOieLyhvrTFuwY0ZwcsBIP0tR9zQyVOpO53x1BFPXfAcwX4E8erVGC473cCfr0vr09zWjWlNYxAD7r2yXlK14KG1C-zhBB3BdgTsWnkZqxHsDjahS9v-rhk6legUI1N7_I9sz-H2dH50qA_fzt4_gTuxunnWZhc2-vN1eIo8prfPkvX-AtdL8EA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Two-Stream+Proximity+Graph+Transformer+for+Skeletal+Person-Person+Interaction+Recognition+With+Statistical+Information&rft.jtitle=IEEE+access&rft.au=Li%2C+Meng&rft.au=Wu%2C+Yaqi&rft.au=Sun%2C+Qiumei&rft.au=Yang%2C+Weifeng&rft.date=2024&rft.issn=2169-3536&rft.eissn=2169-3536&rft.volume=12&rft.spage=193091&rft.epage=193100&rft_id=info:doi/10.1109%2FACCESS.2024.3516511&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_ACCESS_2024_3516511 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2169-3536&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2169-3536&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2169-3536&client=summon |