A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information
Deep spatiotemporal models are used in a variety of computer vision tasks, such as action recognition and video object segmentation. Currently, there is a limited understanding of what information is captured by these models in their intermediate representations. For example, while it has been obser...
Saved in:
Published in | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 13979 - 13989 |
---|---|
Main Authors | , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.06.2022
|
Subjects | |
Online Access | Get full text |
ISSN | 1063-6919 |
DOI | 10.1109/CVPR52688.2022.01361 |
Cover
Abstract | Deep spatiotemporal models are used in a variety of computer vision tasks, such as action recognition and video object segmentation. Currently, there is a limited understanding of what information is captured by these models in their intermediate representations. For example, while it has been observed that action recognition algorithms are heavily influenced by visual appearance in single static frames, there is no quantitative methodology for evaluating such static bias in the latent representation compared to bias toward dynamic information (e.g. motion). We tackle this challenge by proposing a novel approach for quantifying the static and dynamic biases of any spatiotemporal model. To show the efficacy of our approach, we analyse two widely studied tasks, action recognition and video object segmentation. Our key findings are threefold: (i) Most examined spatiotemporal models are biased toward static information; although, certain two-stream architectures with cross-connections show a better balance between the static and dynamic information captured. (ii) Some datasets that are commonly assumed to be biased toward dynamics are actually biased toward static information. (iii) Individual units (channels) in an architecture can be biased toward static, dynamic or a combination of the two. 1 1 Project page and code |
---|---|
AbstractList | Deep spatiotemporal models are used in a variety of computer vision tasks, such as action recognition and video object segmentation. Currently, there is a limited understanding of what information is captured by these models in their intermediate representations. For example, while it has been observed that action recognition algorithms are heavily influenced by visual appearance in single static frames, there is no quantitative methodology for evaluating such static bias in the latent representation compared to bias toward dynamic information (e.g. motion). We tackle this challenge by proposing a novel approach for quantifying the static and dynamic biases of any spatiotemporal model. To show the efficacy of our approach, we analyse two widely studied tasks, action recognition and video object segmentation. Our key findings are threefold: (i) Most examined spatiotemporal models are biased toward static information; although, certain two-stream architectures with cross-connections show a better balance between the static and dynamic information captured. (ii) Some datasets that are commonly assumed to be biased toward dynamics are actually biased toward static information. (iii) Individual units (channels) in an architecture can be biased toward static, dynamic or a combination of the two. 1 1 Project page and code |
Author | Siam, Mennatullah Wildes, Richard P. Kowal, Matthew Islam, Md Amirul Bruce, Neil D. B. Derpanis, Konstantinos G. |
Author_xml | – sequence: 1 givenname: Matthew surname: Kowal fullname: Kowal, Matthew email: m2kowal@eecs.yorku.ca organization: York University – sequence: 2 givenname: Mennatullah surname: Siam fullname: Siam, Mennatullah email: msiam@eecs.yorku.ca organization: York University – sequence: 3 givenname: Md Amirul surname: Islam fullname: Islam, Md Amirul email: mdamirul@ryerson.ca organization: Vector Institute for AI – sequence: 4 givenname: Neil D. B. surname: Bruce fullname: Bruce, Neil D. B. email: brucen@uoguelph.ca organization: Vector Institute for AI – sequence: 5 givenname: Richard P. surname: Wildes fullname: Wildes, Richard P. email: wildes@eecs.yorku.ca organization: York University – sequence: 6 givenname: Konstantinos G. surname: Derpanis fullname: Derpanis, Konstantinos G. email: kosta@eecs.yorku.ca organization: York University |
BookMark | eNotjNtOAjEURavRREC-QB_6AzP2tNPa-kYAlYR4w8sjqZ1THGXayUzF8Pfi5WntrOysPjkIMSAhp8ByAGbOxs93D5IrrXPOOM8ZCAV7pA9KyUKZQol90gOmRKYMmCMy7Lp3xpjgAMroHlmN6ASxwZZOqg3SWUiRvrzZ9GvporGpignrJrZ2TW8wfcX2o6PT4GKJF_T-04ZU-W0VVnSRdl9HN11OJ9tg692eBR_b-icRjsmht-sOh_8ckKfL6eP4OpvfXs3Go3lWcSZSpuBVOWs5CMbAK6lQFwjCgS8518idlNIa6YzSJRfgrC-08aw0hQTuWCEG5OSvWyHismmr2rbbpdHnxhgQ3zxcWO8 |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/CVPR52688.2022.01361 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Applied Sciences |
EISBN | 1665469463 9781665469463 |
EISSN | 1063-6919 |
EndPage | 13989 |
ExternalDocumentID | 9879991 |
Genre | orig-research |
GrantInformation_xml | – fundername: York University funderid: 10.13039/501100000105 |
GroupedDBID | 6IE 6IH 6IL 6IN AAWTH ABLEC ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP OCL RIE RIL RIO |
ID | FETCH-LOGICAL-i203t-61b6caa213001f656e84e13c1fd228e2c555a95c968d231caf489f0d94512c043 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 02:15:10 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i203t-61b6caa213001f656e84e13c1fd228e2c555a95c968d231caf489f0d94512c043 |
PageCount | 11 |
ParticipantIDs | ieee_primary_9879991 |
PublicationCentury | 2000 |
PublicationDate | 2022-June |
PublicationDateYYYYMMDD | 2022-06-01 |
PublicationDate_xml | – month: 06 year: 2022 text: 2022-June |
PublicationDecade | 2020 |
PublicationTitle | Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) |
PublicationTitleAbbrev | CVPR |
PublicationYear | 2022 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0003211698 |
Score | 2.3792722 |
Snippet | Deep spatiotemporal models are used in a variety of computer vision tasks, such as action recognition and video object segmentation. Currently, there is a... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 13979 |
SubjectTerms | Computational modeling Computer architecture Computer vision Dynamics grouping and shape analysis Heuristic algorithms Object segmentation Video analysis and understanding; Action and event recognition; Explainable computer vision; Segmentation Visualization |
Title | A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information |
URI | https://ieeexplore.ieee.org/document/9879991 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV27TsMwFLXaTkwFWsRbHhhJGjtO4rChPlSQWpVHUbfKjxtUFaVVmzDw9dhJKAIxsFleHPk69rn2OecidMUg8RPCpEOpAIfpkDsyisCJOCgtmIxoUa1hNA6HU3Y_C2Y1dL3TwgBAQT4D1zaLt3y9Urm9KuuY_NjimTqqm2VWarV29ym-yWTCmFfqOOLFne7L5NGamVgCF6WuNScjP2qoFEfIoIlGX4OXzJGlm2fSVR-_fBn_-3X7qP0t1sOT3TF0gGqQHqJmhS5x9e9uW-j1FvcA1rDBPbPD4bs0W2Fr3F304qeCWV0ZVb3hcckO3-J-akXvN_ghF5ZWZEVR2ALUhcLvWxf3yoL2uFI12Si30XTQf-4OnarMgrOgnp-Z5FGGSghqH7ZIYvAdcAbEVyTRlHKgKggCEQcqDrk2aFCJhPE48XTMDFhQHvOPUCNdpXCMsDRLIiFmEwBt0r5ISKJloDypOYEwIuwEtey8zdelk8a8mrLTv7vP0J6NXEnMOkeNbJPDhYEAmbwsYv8J8DSyDQ |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwGG0QD3pCBeNve_Doxtp1W-fN8COgQFDBcCNr-80QzSAwPPjX224To_HgremlS9v1e1_73vsQumIQuzFhwqI0Aospn1siCMAKOEgVMRHQrFpDf-B3xuxu4k1K6HqjhQGAjHwGtmlmb_lqLtfmqqyu82ODZ7bQto77zMvVWpsbFVfnMn7IC30cccJ643n4aOxMDIWLUtvYk5EfVVSyINKuoP7X8Dl35NVep8KWH7-cGf_7fXuo9i3Xw8NNINpHJUgOUKXAl7j4e1dV9HKLmwALWOKmPuNwN0nn2Fh3Z734KeNWF1ZVb3iQ88NXuJUY2fsNflhHhlhkZFHYQNSZxO8rGzfzkva40DWZda6hcbs1anSsotCCNaOOm-r0Ufgyiqh52iKxRnjAGRBXklhRyoFKz_Oi0JOhz5XGgzKKGQ9jR4VMwwXpMPcQlZN5AkcIC70pYqKPAVA68QsiQZTwpCMUJ-AHhB2jqpm36SL30pgWU3byd_cl2umM-r1przu4P0W7ZhVzmtYZKqfLNZxrQJCKi2wffAJwhbVa |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=A+Deeper+Dive+Into+What+Deep+Spatiotemporal+Networks+Encode%3A+Quantifying+Static+vs.+Dynamic+Information&rft.au=Kowal%2C+Matthew&rft.au=Siam%2C+Mennatullah&rft.au=Islam%2C+Md+Amirul&rft.au=Bruce%2C+Neil+D.+B.&rft.date=2022-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=13979&rft.epage=13989&rft_id=info:doi/10.1109%2FCVPR52688.2022.01361&rft.externalDocID=9879991 |