Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks
Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations. In this paper, we study the compositionality of action by looking into the dynamics of subject-object interactions. We propose a novel...
Saved in:
Published in | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 1046 - 1056 |
---|---|
Main Authors | , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.06.2020
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations. In this paper, we study the compositionality of action by looking into the dynamics of subject-object interactions. We propose a novel model which can explicitly reason about the geometric relations between constituent objects and an agent performing an action. To train our model, we collect dense object box annotations on the Something-Something dataset. We propose a novel compositional action recognition task where the training combinations of verbs and nouns do not overlap with the test set. The novel aspects of our model are applicable to activities with prominent object interaction dynamics and to objects which can be tracked using state-of-the-art approaches; for activities without clearly defined spatial object-agent interactions, we rely on baseline scene-level spatio-temporal representations. We show the effectiveness of our approach not only on the proposed compositional action recognition task but also in a few-shot compositional setting which requires the model to generalize across both object appearance and action category. |
---|---|
AbstractList | Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations. In this paper, we study the compositionality of action by looking into the dynamics of subject-object interactions. We propose a novel model which can explicitly reason about the geometric relations between constituent objects and an agent performing an action. To train our model, we collect dense object box annotations on the Something-Something dataset. We propose a novel compositional action recognition task where the training combinations of verbs and nouns do not overlap with the test set. The novel aspects of our model are applicable to activities with prominent object interaction dynamics and to objects which can be tracked using state-of-the-art approaches; for activities without clearly defined spatial object-agent interactions, we rely on baseline scene-level spatio-temporal representations. We show the effectiveness of our approach not only on the proposed compositional action recognition task but also in a few-shot compositional setting which requires the model to generalize across both object appearance and action category. |
Author | Xiao, Tete Xu, Huijuan Wang, Xiaolong Darrell, Trevor Herzig, Roei Materzynska, Joanna |
Author_xml | – sequence: 1 givenname: Joanna surname: Materzynska fullname: Materzynska, Joanna organization: University of Oxford, TwentyBN – sequence: 2 givenname: Tete surname: Xiao fullname: Xiao, Tete organization: UC Berkeley – sequence: 3 givenname: Roei surname: Herzig fullname: Herzig, Roei organization: Tel Aviv University – sequence: 4 givenname: Huijuan surname: Xu fullname: Xu, Huijuan organization: UC Berkeley – sequence: 5 givenname: Xiaolong surname: Wang fullname: Wang, Xiaolong organization: UC Berkeley – sequence: 6 givenname: Trevor surname: Darrell fullname: Darrell, Trevor organization: UC Berkeley |
BookMark | eNotj0FLw0AUhFdRsNb-Aj3kDyS-t5vsZr2V0GqhqLRVDx7Ka_LarrbZkiyI_95oPQwzDB8DcynOal-zEDcICSLY2-L1eZZKDZBIkJAAIKoTMbAmRyM7oc6zU9FD0CrWFu2FGLTtBwAoiaht3hPvc7_nsHX1Jh7tWr6LCr8_-NYF52vaRcPyN0QzLv2m_iujNxe20fxAwdEuXnBHNx04qQM3dKQfOXz55rO9Eudr6kYH_94XL-PRoniIp0_3k2I4jZ0EFeLKrDItlV3nqkpTU9qSmFAbbVCVACwr1llFErkqtZRE65UlnRoGTHNgUn1xfdx1zLw8NG5PzffSYta9z9UPbslXUg |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/CVPR42600.2020.00113 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Applied Sciences |
EISBN | 9781728171685 1728171687 |
EISSN | 1063-6919 |
EndPage | 1056 |
ExternalDocumentID | 9156858 |
Genre | orig-research |
GroupedDBID | 6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
ID | FETCH-LOGICAL-i203t-d7b56239f83d447c9caea1676713c00e2de65da21edc622aafb9a647e01480ea3 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 02:30:34 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i203t-d7b56239f83d447c9caea1676713c00e2de65da21edc622aafb9a647e01480ea3 |
PageCount | 11 |
ParticipantIDs | ieee_primary_9156858 |
PublicationCentury | 2000 |
PublicationDate | 2020-Jun |
PublicationDateYYYYMMDD | 2020-06-01 |
PublicationDate_xml | – month: 06 year: 2020 text: 2020-Jun |
PublicationDecade | 2020 |
PublicationTitle | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) |
PublicationTitleAbbrev | CVPR |
PublicationYear | 2020 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0003211698 |
Score | 2.563481 |
Snippet | Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 1046 |
SubjectTerms | Cognition Computational modeling Detectors Feature extraction Task analysis Training Videos |
Title | Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks |
URI | https://ieeexplore.ieee.org/document/9156858 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTwIxEJ4AJ0_4wPhODx4t7LbL7tabIRJiAiEISuKB9DEbiQSMLBd_vW13wWg8eGs2m2zT2Xbmm37fDMB1mIVoMEaatqXLVsUhVXZDUp5pHSUZR4yc3rk_iHuT6GHanlbgZqeFQURPPsOmG_q7fLPSG5cqawkLNtJ2WoWqBW6FVmuXT-EWycQiLdVxYSBanafhyNdftyiQBf7Kgf_ooeJdSLcO_e3HC-bIW3OTq6b-_FWX8b-z24fGt1iPDHdu6AAquDyEehldknLvro_g5XHl2kXbd6iTmdwSdxSUlC25IHde4EBGW0KRHT_P81fiehbbf5SOixpWC-JziIUcggwKEvm6AZPu_bjTo2VrBTpnAc-pSZQLfESWchNFiRZaogxd8baQ6yBAZu3XNpJZQ-qYMSkzJWQcJegSkAFKfgy15WqJJ0A0C1JjYYqysUBk0OIVRKGyTJlUGibYKRy5tZq9F9UzZuUynf39-Bz2nLUKMtYF1PKPDV5at5-rK2_vLyuCr3g |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LTgIxFG0QF7pCBePbLlxamGnnVXeGSFCBEAQlcUH6uBOJBIwMG7_etjNiNC7cNZNJpumd9j56zrkIXfipDxoiIEkobLUq8ok0G5KwVKkgThlAYPnO3V7UHgV343BcQpdrLgwAOPAZ1O3Q3eXrhVrZUlmDm2QjCZMNtGn8fujnbK11RYWZXCbiScGP8z3eaD72B06B3eSB1HOXDuxHFxXnRFoV1P36fI4dea2vMllXH7-UGf87vx1U-6br4f7aEe2iEsz3UKWIL3Gxe5dV9PywsA2jzTvEEk2usD0MCtCWmOFrR3HAgy9IkRk_TbMXbLsWm7-UDHMVqxl2VcScEIF7OYx8WUOj1s2w2SZFcwUypR7LiI6lDX14mjAdBLHiSoDwrXybz5TnATUWDLWgxpQqolSIVHIRBTHYEqQHgu2j8nwxhwOEFfUSbRIVaaKBQIPJWAC4TFOpE6Epp4eoatdq8pbrZ0yKZTr6-_E52moPu51J57Z3f4y2reVyaNYJKmfvKzg1QUAmz5ztPwGAIbLB |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Something-Else%3A+Compositional+Action+Recognition+With+Spatial-Temporal+Interaction+Networks&rft.au=Materzynska%2C+Joanna&rft.au=Xiao%2C+Tete&rft.au=Herzig%2C+Roei&rft.au=Xu%2C+Huijuan&rft.date=2020-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=1046&rft.epage=1056&rft_id=info:doi/10.1109%2FCVPR42600.2020.00113&rft.externalDocID=9156858 |