Mutual modality learning for video action classification

The construction of models for video action classification progresses rapidly. However, the performance of those models can still be easily improved by ensembling with the same models trained on different modalities (e.g. Optical flow). Unfortunately, it is computationally expensive to use several m...

Full description

Saved in:
Bibliographic Details
Published inKompʹûternaâ optika Vol. 47; no. 4; pp. 637 - 649
Main Authors Komkov, S.A., Dzabraev, M.D., Petiushko, A.A.
Format Journal Article
LanguageEnglish
Published Samara National Research University 01.08.2023
Subjects
Online AccessGet full text

Cover

Loading…
Abstract The construction of models for video action classification progresses rapidly. However, the performance of those models can still be easily improved by ensembling with the same models trained on different modalities (e.g. Optical flow). Unfortunately, it is computationally expensive to use several modalities during inference. Recent works examine the ways to integrate advantages of multi-modality into a single RGB-model. Yet, there is still room for improvement. In this paper, we explore various methods to embed the ensemble power into a single model. We show that proper initialization, as well as mutual modality learning, enhances single-modality models. As a result, we achieve state-of-the-art results in the Something-Something-v2 benchmark.
AbstractList The construction of models for video action classification progresses rapidly. However, the performance of those models can still be easily improved by ensembling with the same models trained on different modalities (e.g. Optical flow). Unfortunately, it is computationally expensive to use several modalities during inference. Recent works examine the ways to integrate advantages of multi-modality into a single RGB-model. Yet, there is still room for improvement. In this paper, we explore various methods to embed the ensemble power into a single model. We show that proper initialization, as well as mutual modality learning, enhances single-modality models. As a result, we achieve state-of-the-art results in the Something-Something-v2 benchmark.
Author Petiushko, A.A.
Komkov, S.A.
Dzabraev, M.D.
Author_xml – sequence: 1
  givenname: S.A.
  surname: Komkov
  fullname: Komkov, S.A.
– sequence: 2
  givenname: M.D.
  surname: Dzabraev
  fullname: Dzabraev, M.D.
– sequence: 3
  givenname: A.A.
  surname: Petiushko
  fullname: Petiushko, A.A.
BookMark eNp9kM1OwzAQhC1UJErpC3DKCwS8thMnRxTxU6moFzhbG_9UrtwYOSlS356kBQ4cOK12RjPa_a7JrIudJeQW6B1UrJL3TADLS5B13mxyYFJekPmvNiNzClzkTBTsiiz7fkcpHVMlCJiT6vUwHDBk-2gw-OGYBYup8902czFln97YmKEefOwyHbDvvfMap_WGXDoMvV1-zwV5f3p8a17y9eZ51Tysc80LOeS1BdGiAa0dQMmtQGx5W-i65RINsrJ22hheSV1oKxmMPquhdBJoWcjRWZDVuddE3KmP5PeYjiqiVychpq3CNHgdrEInOFBZAIIRRjjkLQBD5hhYU7RTV3Xu0in2fbJOaT-cvhkS-qCAqhNQNcFTEzzVbNQEdIyyP9GfU_4JfQEdj3sB
CitedBy_id crossref_primary_10_1109_TCSVT_2024_3398624
ContentType Journal Article
CorporateAuthor Lomonosov Moscow State University
Huawei Moscow Research Center
CorporateAuthor_xml – name: Lomonosov Moscow State University
– name: Huawei Moscow Research Center
DBID AAYXX
CITATION
DOA
DOI 10.18287/2412-6179-CO-1277
DatabaseName CrossRef
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
DatabaseTitleList
CrossRef
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISSN 2412-6179
EndPage 649
ExternalDocumentID oai_doaj_org_article_af4310751a1d4d4fa3b112a2f21ed5bd
10_18287_2412_6179_CO_1277
GroupedDBID 642
AAFWJ
AAYXX
ADBBV
AFPKN
ALMA_UNASSIGNED_HOLDINGS
BCNDV
CITATION
GROUPED_DOAJ
ID FETCH-LOGICAL-c357t-9e14bad1ccf1163e4aab3b5c9b37ada269fcdd387c5ce7214aa2916f710657dd3
IEDL.DBID DOA
ISSN 0134-2452
IngestDate Wed Aug 27 01:31:13 EDT 2025
Thu Apr 24 23:05:58 EDT 2025
Tue Jul 01 03:11:56 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 4
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c357t-9e14bad1ccf1163e4aab3b5c9b37ada269fcdd387c5ce7214aa2916f710657dd3
OpenAccessLink https://doaj.org/article/af4310751a1d4d4fa3b112a2f21ed5bd
PageCount 13
ParticipantIDs doaj_primary_oai_doaj_org_article_af4310751a1d4d4fa3b112a2f21ed5bd
crossref_citationtrail_10_18287_2412_6179_CO_1277
crossref_primary_10_18287_2412_6179_CO_1277
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2023-08-01
PublicationDateYYYYMMDD 2023-08-01
PublicationDate_xml – month: 08
  year: 2023
  text: 2023-08-01
  day: 01
PublicationDecade 2020
PublicationTitle Kompʹûternaâ optika
PublicationYear 2023
Publisher Samara National Research University
Publisher_xml – name: Samara National Research University
SSID ssj0002876141
Score 2.2837248
Snippet The construction of models for video action classification progresses rapidly. However, the performance of those models can still be easily improved by...
SourceID doaj
crossref
SourceType Open Website
Enrichment Source
Index Database
StartPage 637
SubjectTerms mutual learning
optical flow
video action classification
video labeling
video recognition
Title Mutual modality learning for video action classification
URI https://doaj.org/article/af4310751a1d4d4fa3b112a2f21ed5bd
Volume 47
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEA7iyYtvsb7IwZuEkmyy2Ry1WIpQe7HQW8hTEG1F2__vzG4s9aIXr5vssnyzyXyzmfmGkGuk7Do5sIDygskgPYM9MDHvkhZeucq0Iq7jx3o0lQ8zNdto9YU5YZ08cAdc32VwceDXuONRRpld5YEiOJEFT1H5iLsv-LyNYOql_WUE4bnsmhFWkuHxYqmYQYH3PrgtgcVxhg0mjAutf3ilDfH-1ssM98luoYf0tnutA7KV5odkr1BFWhbi5xFpxiss_KBvi9gSaVq6PzxTIKEUa-sWtCtZoAH5MSYEtTY4JtPh_dNgxEoTBBYqpZfMJC69izyEzIE7Jemcr7wKxlfaRSdqk0OMVaODCgAvh3EBlC8Dc6iVhpETsj1fzNMpoSJ7YFMRlqhuZOOzawJXqQ7ZoKibqXuEf4NgQ1EIx0YVrxYjBQTOInAWgbODiUXgeuRmfc97p4_x6-w7xHY9E7Wt2wtgcVssbv-y-Nl_POSc7GDj-C6V74JsLz9W6RLoxdJftV_SF0iJyUA
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Mutual+modality+learning+for+video+action+classification&rft.jtitle=Komp%CA%B9%C3%BBterna%C3%A2+optika&rft.au=Komkov%2C+S.A.&rft.au=Dzabraev%2C+M.D.&rft.au=Petiushko%2C+A.A.&rft.date=2023-08-01&rft.issn=0134-2452&rft.eissn=2412-6179&rft.volume=47&rft.issue=4&rft.spage=637&rft.epage=649&rft_id=info:doi/10.18287%2F2412-6179-CO-1277&rft.externalDBID=n%2Fa&rft.externalDocID=10_18287_2412_6179_CO_1277
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0134-2452&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0134-2452&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0134-2452&client=summon