Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks

Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations. In this paper, we study the compositionality of action by looking into the dynamics of subject-object interactions. We propose a novel...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 1046 - 1056
Main Authors	Materzynska, Joanna, Xiao, Tete, Herzig, Roei, Xu, Huijuan, Wang, Xiaolong, Darrell, Trevor
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2020
Subjects	Cognition Computational modeling Detectors Feature extraction Task analysis Training Videos
Online Access	Get full text

Cover

Loading…

Abstract	Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations. In this paper, we study the compositionality of action by looking into the dynamics of subject-object interactions. We propose a novel model which can explicitly reason about the geometric relations between constituent objects and an agent performing an action. To train our model, we collect dense object box annotations on the Something-Something dataset. We propose a novel compositional action recognition task where the training combinations of verbs and nouns do not overlap with the test set. The novel aspects of our model are applicable to activities with prominent object interaction dynamics and to objects which can be tracked using state-of-the-art approaches; for activities without clearly defined spatial object-agent interactions, we rely on baseline scene-level spatio-temporal representations. We show the effectiveness of our approach not only on the proposed compositional action recognition task but also in a few-shot compositional setting which requires the model to generalize across both object appearance and action category.
AbstractList	Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training demonstrations. In this paper, we study the compositionality of action by looking into the dynamics of subject-object interactions. We propose a novel model which can explicitly reason about the geometric relations between constituent objects and an agent performing an action. To train our model, we collect dense object box annotations on the Something-Something dataset. We propose a novel compositional action recognition task where the training combinations of verbs and nouns do not overlap with the test set. The novel aspects of our model are applicable to activities with prominent object interaction dynamics and to objects which can be tracked using state-of-the-art approaches; for activities without clearly defined spatial object-agent interactions, we rely on baseline scene-level spatio-temporal representations. We show the effectiveness of our approach not only on the proposed compositional action recognition task but also in a few-shot compositional setting which requires the model to generalize across both object appearance and action category.
Author	Xiao, Tete Xu, Huijuan Wang, Xiaolong Darrell, Trevor Herzig, Roei Materzynska, Joanna
Author_xml	– sequence: 1 givenname: Joanna surname: Materzynska fullname: Materzynska, Joanna organization: University of Oxford, TwentyBN – sequence: 2 givenname: Tete surname: Xiao fullname: Xiao, Tete organization: UC Berkeley – sequence: 3 givenname: Roei surname: Herzig fullname: Herzig, Roei organization: Tel Aviv University – sequence: 4 givenname: Huijuan surname: Xu fullname: Xu, Huijuan organization: UC Berkeley – sequence: 5 givenname: Xiaolong surname: Wang fullname: Wang, Xiaolong organization: UC Berkeley – sequence: 6 givenname: Trevor surname: Darrell fullname: Darrell, Trevor organization: UC Berkeley
BookMark	eNotj0FLw0AUhFdRsNb-Aj3kDyS-t5vsZr2V0GqhqLRVDx7Ka_LarrbZkiyI_95oPQwzDB8DcynOal-zEDcICSLY2-L1eZZKDZBIkJAAIKoTMbAmRyM7oc6zU9FD0CrWFu2FGLTtBwAoiaht3hPvc7_nsHX1Jh7tWr6LCr8_-NYF52vaRcPyN0QzLv2m_iujNxe20fxAwdEuXnBHNx04qQM3dKQfOXz55rO9Eudr6kYH_94XL-PRoniIp0_3k2I4jZ0EFeLKrDItlV3nqkpTU9qSmFAbbVCVACwr1llFErkqtZRE65UlnRoGTHNgUn1xfdx1zLw8NG5PzffSYta9z9UPbslXUg
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/CVPR42600.2020.00113
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences
EISBN	9781728171685 1728171687
EISSN	1063-6919
EndPage	1056
ExternalDocumentID	9156858
Genre	orig-research
GroupedDBID	6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO
ID	FETCH-LOGICAL-i203t-d7b56239f83d447c9caea1676713c00e2de65da21edc622aafb9a647e01480ea3
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:30:34 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i203t-d7b56239f83d447c9caea1676713c00e2de65da21edc622aafb9a647e01480ea3
PageCount	11
ParticipantIDs	ieee_primary_9156858
PublicationCentury	2000
PublicationDate	2020-Jun
PublicationDateYYYYMMDD	2020-06-01
PublicationDate_xml	– month: 06 year: 2020 text: 2020-Jun
PublicationDecade	2020
PublicationTitle	Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev	CVPR
PublicationYear	2020
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0003211698
Score	2.563481
Snippet	Human action is naturally compositional: humans can easily recognize and perform actions with objects that are different from those used in training...
SourceID	ieee
SourceType	Publisher
StartPage	1046
SubjectTerms	Cognition Computational modeling Detectors Feature extraction Task analysis Training Videos
Title	Something-Else: Compositional Action Recognition With Spatial-Temporal Interaction Networks
URI	https://ieeexplore.ieee.org/document/9156858
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LTwIxEJ4AJ0_4wPhODx4t7LbL7tabIRJiAiEISuKB9DEbiQSMLBd_vW13wWg8eGs2m2zT2Xbmm37fDMB1mIVoMEaatqXLVsUhVXZDUp5pHSUZR4yc3rk_iHuT6GHanlbgZqeFQURPPsOmG_q7fLPSG5cqawkLNtJ2WoWqBW6FVmuXT-EWycQiLdVxYSBanafhyNdftyiQBf7Kgf_ooeJdSLcO_e3HC-bIW3OTq6b-_FWX8b-z24fGt1iPDHdu6AAquDyEehldknLvro_g5XHl2kXbd6iTmdwSdxSUlC25IHde4EBGW0KRHT_P81fiehbbf5SOixpWC-JziIUcggwKEvm6AZPu_bjTo2VrBTpnAc-pSZQLfESWchNFiRZaogxd8baQ6yBAZu3XNpJZQ-qYMSkzJWQcJegSkAFKfgy15WqJJ0A0C1JjYYqysUBk0OIVRKGyTJlUGibYKRy5tZq9F9UzZuUynf39-Bz2nLUKMtYF1PKPDV5at5-rK2_vLyuCr3g
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3LTgIxFG0QF7pCBePbLlxamGnnVXeGSFCBEAQlcUH6uBOJBIwMG7_etjNiNC7cNZNJpumd9j56zrkIXfipDxoiIEkobLUq8ok0G5KwVKkgThlAYPnO3V7UHgV343BcQpdrLgwAOPAZ1O3Q3eXrhVrZUlmDm2QjCZMNtGn8fujnbK11RYWZXCbiScGP8z3eaD72B06B3eSB1HOXDuxHFxXnRFoV1P36fI4dea2vMllXH7-UGf87vx1U-6br4f7aEe2iEsz3UKWIL3Gxe5dV9PywsA2jzTvEEk2usD0MCtCWmOFrR3HAgy9IkRk_TbMXbLsWm7-UDHMVqxl2VcScEIF7OYx8WUOj1s2w2SZFcwUypR7LiI6lDX14mjAdBLHiSoDwrXybz5TnATUWDLWgxpQqolSIVHIRBTHYEqQHgu2j8nwxhwOEFfUSbRIVaaKBQIPJWAC4TFOpE6Epp4eoatdq8pbrZ0yKZTr6-_E52moPu51J57Z3f4y2reVyaNYJKmfvKzg1QUAmz5ztPwGAIbLB
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=Something-Else%3A+Compositional+Action+Recognition+With+Spatial-Temporal+Interaction+Networks&rft.au=Materzynska%2C+Joanna&rft.au=Xiao%2C+Tete&rft.au=Herzig%2C+Roei&rft.au=Xu%2C+Huijuan&rft.date=2020-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=1046&rft.epage=1056&rft_id=info:doi/10.1109%2FCVPR42600.2020.00113&rft.externalDocID=9156858