A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information

Deep spatiotemporal models are used in a variety of computer vision tasks, such as action recognition and video object segmentation. Currently, there is a limited understanding of what information is captured by these models in their intermediate representations. For example, while it has been obser...

Full description

Saved in:
Bibliographic Details
Published inProceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online) pp. 13979 - 13989
Main Authors Kowal, Matthew, Siam, Mennatullah, Islam, Md Amirul, Bruce, Neil D. B., Wildes, Richard P., Derpanis, Konstantinos G.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2022
Subjects
Online AccessGet full text
ISSN1063-6919
DOI10.1109/CVPR52688.2022.01361

Cover

Abstract Deep spatiotemporal models are used in a variety of computer vision tasks, such as action recognition and video object segmentation. Currently, there is a limited understanding of what information is captured by these models in their intermediate representations. For example, while it has been observed that action recognition algorithms are heavily influenced by visual appearance in single static frames, there is no quantitative methodology for evaluating such static bias in the latent representation compared to bias toward dynamic information (e.g. motion). We tackle this challenge by proposing a novel approach for quantifying the static and dynamic biases of any spatiotemporal model. To show the efficacy of our approach, we analyse two widely studied tasks, action recognition and video object segmentation. Our key findings are threefold: (i) Most examined spatiotemporal models are biased toward static information; although, certain two-stream architectures with cross-connections show a better balance between the static and dynamic information captured. (ii) Some datasets that are commonly assumed to be biased toward dynamics are actually biased toward static information. (iii) Individual units (channels) in an architecture can be biased toward static, dynamic or a combination of the two. 1 1 Project page and code
AbstractList Deep spatiotemporal models are used in a variety of computer vision tasks, such as action recognition and video object segmentation. Currently, there is a limited understanding of what information is captured by these models in their intermediate representations. For example, while it has been observed that action recognition algorithms are heavily influenced by visual appearance in single static frames, there is no quantitative methodology for evaluating such static bias in the latent representation compared to bias toward dynamic information (e.g. motion). We tackle this challenge by proposing a novel approach for quantifying the static and dynamic biases of any spatiotemporal model. To show the efficacy of our approach, we analyse two widely studied tasks, action recognition and video object segmentation. Our key findings are threefold: (i) Most examined spatiotemporal models are biased toward static information; although, certain two-stream architectures with cross-connections show a better balance between the static and dynamic information captured. (ii) Some datasets that are commonly assumed to be biased toward dynamics are actually biased toward static information. (iii) Individual units (channels) in an architecture can be biased toward static, dynamic or a combination of the two. 1 1 Project page and code
Author Siam, Mennatullah
Wildes, Richard P.
Kowal, Matthew
Islam, Md Amirul
Bruce, Neil D. B.
Derpanis, Konstantinos G.
Author_xml – sequence: 1
  givenname: Matthew
  surname: Kowal
  fullname: Kowal, Matthew
  email: m2kowal@eecs.yorku.ca
  organization: York University
– sequence: 2
  givenname: Mennatullah
  surname: Siam
  fullname: Siam, Mennatullah
  email: msiam@eecs.yorku.ca
  organization: York University
– sequence: 3
  givenname: Md Amirul
  surname: Islam
  fullname: Islam, Md Amirul
  email: mdamirul@ryerson.ca
  organization: Vector Institute for AI
– sequence: 4
  givenname: Neil D. B.
  surname: Bruce
  fullname: Bruce, Neil D. B.
  email: brucen@uoguelph.ca
  organization: Vector Institute for AI
– sequence: 5
  givenname: Richard P.
  surname: Wildes
  fullname: Wildes, Richard P.
  email: wildes@eecs.yorku.ca
  organization: York University
– sequence: 6
  givenname: Konstantinos G.
  surname: Derpanis
  fullname: Derpanis, Konstantinos G.
  email: kosta@eecs.yorku.ca
  organization: York University
BookMark eNotjNtOAjEURavRREC-QB_6AzP2tNPa-kYAlYR4w8sjqZ1THGXayUzF8Pfi5WntrOysPjkIMSAhp8ByAGbOxs93D5IrrXPOOM8ZCAV7pA9KyUKZQol90gOmRKYMmCMy7Lp3xpjgAMroHlmN6ASxwZZOqg3SWUiRvrzZ9GvporGpignrJrZ2TW8wfcX2o6PT4GKJF_T-04ZU-W0VVnSRdl9HN11OJ9tg692eBR_b-icRjsmht-sOh_8ckKfL6eP4OpvfXs3Go3lWcSZSpuBVOWs5CMbAK6lQFwjCgS8518idlNIa6YzSJRfgrC-08aw0hQTuWCEG5OSvWyHismmr2rbbpdHnxhgQ3zxcWO8
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR52688.2022.01361
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 1665469463
9781665469463
EISSN 1063-6919
EndPage 13989
ExternalDocumentID 9879991
Genre orig-research
GrantInformation_xml – fundername: York University
  funderid: 10.13039/501100000105
GroupedDBID 6IE
6IH
6IL
6IN
AAWTH
ABLEC
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
OCL
RIE
RIL
RIO
ID FETCH-LOGICAL-i203t-61b6caa213001f656e84e13c1fd228e2c555a95c968d231caf489f0d94512c043
IEDL.DBID RIE
IngestDate Wed Aug 27 02:15:10 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-61b6caa213001f656e84e13c1fd228e2c555a95c968d231caf489f0d94512c043
PageCount 11
ParticipantIDs ieee_primary_9879991
PublicationCentury 2000
PublicationDate 2022-June
PublicationDateYYYYMMDD 2022-06-01
PublicationDate_xml – month: 06
  year: 2022
  text: 2022-June
PublicationDecade 2020
PublicationTitle Proceedings (IEEE Computer Society Conference on Computer Vision and Pattern Recognition. Online)
PublicationTitleAbbrev CVPR
PublicationYear 2022
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0003211698
Score 2.3792722
Snippet Deep spatiotemporal models are used in a variety of computer vision tasks, such as action recognition and video object segmentation. Currently, there is a...
SourceID ieee
SourceType Publisher
StartPage 13979
SubjectTerms Computational modeling
Computer architecture
Computer vision
Dynamics
grouping and shape analysis
Heuristic algorithms
Object segmentation
Video analysis and understanding; Action and event recognition; Explainable computer vision; Segmentation
Visualization
Title A Deeper Dive Into What Deep Spatiotemporal Networks Encode: Quantifying Static vs. Dynamic Information
URI https://ieeexplore.ieee.org/document/9879991
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV27TsMwFLXaTkwFWsRbHhhJGjtO4rChPlSQWpVHUbfKjxtUFaVVmzDw9dhJKAIxsFleHPk69rn2OecidMUg8RPCpEOpAIfpkDsyisCJOCgtmIxoUa1hNA6HU3Y_C2Y1dL3TwgBAQT4D1zaLt3y9Urm9KuuY_NjimTqqm2VWarV29ym-yWTCmFfqOOLFne7L5NGamVgCF6WuNScjP2qoFEfIoIlGX4OXzJGlm2fSVR-_fBn_-3X7qP0t1sOT3TF0gGqQHqJmhS5x9e9uW-j1FvcA1rDBPbPD4bs0W2Fr3F304qeCWV0ZVb3hcckO3-J-akXvN_ghF5ZWZEVR2ALUhcLvWxf3yoL2uFI12Si30XTQf-4OnarMgrOgnp-Z5FGGSghqH7ZIYvAdcAbEVyTRlHKgKggCEQcqDrk2aFCJhPE48XTMDFhQHvOPUCNdpXCMsDRLIiFmEwBt0r5ISKJloDypOYEwIuwEtey8zdelk8a8mrLTv7vP0J6NXEnMOkeNbJPDhYEAmbwsYv8J8DSyDQ
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwGG0QD3pCBeNve_Doxtp1W-fN8COgQFDBcCNr-80QzSAwPPjX224To_HgremlS9v1e1_73vsQumIQuzFhwqI0Aospn1siCMAKOEgVMRHQrFpDf-B3xuxu4k1K6HqjhQGAjHwGtmlmb_lqLtfmqqyu82ODZ7bQto77zMvVWpsbFVfnMn7IC30cccJ643n4aOxMDIWLUtvYk5EfVVSyINKuoP7X8Dl35NVep8KWH7-cGf_7fXuo9i3Xw8NNINpHJUgOUKXAl7j4e1dV9HKLmwALWOKmPuNwN0nn2Fh3Z734KeNWF1ZVb3iQ88NXuJUY2fsNflhHhlhkZFHYQNSZxO8rGzfzkva40DWZda6hcbs1anSsotCCNaOOm-r0Ufgyiqh52iKxRnjAGRBXklhRyoFKz_Oi0JOhz5XGgzKKGQ9jR4VMwwXpMPcQlZN5AkcIC70pYqKPAVA68QsiQZTwpCMUJ-AHhB2jqpm36SL30pgWU3byd_cl2umM-r1przu4P0W7ZhVzmtYZKqfLNZxrQJCKi2wffAJwhbVa
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=Proceedings+%28IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition.+Online%29&rft.atitle=A+Deeper+Dive+Into+What+Deep+Spatiotemporal+Networks+Encode%3A+Quantifying+Static+vs.+Dynamic+Information&rft.au=Kowal%2C+Matthew&rft.au=Siam%2C+Mennatullah&rft.au=Islam%2C+Md+Amirul&rft.au=Bruce%2C+Neil+D.+B.&rft.date=2022-06-01&rft.pub=IEEE&rft.eissn=1063-6919&rft.spage=13979&rft.epage=13989&rft_id=info:doi/10.1109%2FCVPR52688.2022.01361&rft.externalDocID=9879991