Mutual modality learning for video action classification

The construction of models for video action classification progresses rapidly. However, the performance of those models can still be easily improved by ensembling with the same models trained on different modalities (e.g. Optical flow). Unfortunately, it is computationally expensive to use several m...

Full description

Saved in:

Bibliographic Details
Published in	Kompʹûternaâ optika Vol. 47; no. 4; pp. 637 - 649
Main Authors	Komkov, S.A., Dzabraev, M.D., Petiushko, A.A.
Format	Journal Article
Language	English
Published	Samara National Research University 01.08.2023
Subjects	mutual learning optical flow video action classification video labeling video recognition
Online Access	Get full text

Cover

Loading…

Abstract	The construction of models for video action classification progresses rapidly. However, the performance of those models can still be easily improved by ensembling with the same models trained on different modalities (e.g. Optical flow). Unfortunately, it is computationally expensive to use several modalities during inference. Recent works examine the ways to integrate advantages of multi-modality into a single RGB-model. Yet, there is still room for improvement. In this paper, we explore various methods to embed the ensemble power into a single model. We show that proper initialization, as well as mutual modality learning, enhances single-modality models. As a result, we achieve state-of-the-art results in the Something-Something-v2 benchmark.
AbstractList	The construction of models for video action classification progresses rapidly. However, the performance of those models can still be easily improved by ensembling with the same models trained on different modalities (e.g. Optical flow). Unfortunately, it is computationally expensive to use several modalities during inference. Recent works examine the ways to integrate advantages of multi-modality into a single RGB-model. Yet, there is still room for improvement. In this paper, we explore various methods to embed the ensemble power into a single model. We show that proper initialization, as well as mutual modality learning, enhances single-modality models. As a result, we achieve state-of-the-art results in the Something-Something-v2 benchmark.
Author	Petiushko, A.A. Komkov, S.A. Dzabraev, M.D.
Author_xml	– sequence: 1 givenname: S.A. surname: Komkov fullname: Komkov, S.A. – sequence: 2 givenname: M.D. surname: Dzabraev fullname: Dzabraev, M.D. – sequence: 3 givenname: A.A. surname: Petiushko fullname: Petiushko, A.A.
BookMark	eNp9kM1OwzAQhC1UJErpC3DKCwS8thMnRxTxU6moFzhbG_9UrtwYOSlS356kBQ4cOK12RjPa_a7JrIudJeQW6B1UrJL3TADLS5B13mxyYFJekPmvNiNzClzkTBTsiiz7fkcpHVMlCJiT6vUwHDBk-2gw-OGYBYup8902czFln97YmKEefOwyHbDvvfMap_WGXDoMvV1-zwV5f3p8a17y9eZ51Tysc80LOeS1BdGiAa0dQMmtQGx5W-i65RINsrJ22hheSV1oKxmMPquhdBJoWcjRWZDVuddE3KmP5PeYjiqiVychpq3CNHgdrEInOFBZAIIRRjjkLQBD5hhYU7RTV3Xu0in2fbJOaT-cvhkS-qCAqhNQNcFTEzzVbNQEdIyyP9GfU_4JfQEdj3sB
CitedBy_id	crossref_primary_10_1109_TCSVT_2024_3398624
ContentType	Journal Article
CorporateAuthor	Lomonosov Moscow State University Huawei Moscow Research Center
CorporateAuthor_xml	– name: Lomonosov Moscow State University – name: Huawei Moscow Research Center
DBID	AAYXX CITATION DOA
DOI	10.18287/2412-6179-CO-1277
DatabaseName	CrossRef DOAJ Directory of Open Access Journals
DatabaseTitle	CrossRef
DatabaseTitleList	CrossRef
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences
EISSN	2412-6179
EndPage	649
ExternalDocumentID	oai_doaj_org_article_af4310751a1d4d4fa3b112a2f21ed5bd 10_18287_2412_6179_CO_1277
GroupedDBID	642 AAFWJ AAYXX ADBBV AFPKN ALMA_UNASSIGNED_HOLDINGS BCNDV CITATION GROUPED_DOAJ
ID	FETCH-LOGICAL-c357t-9e14bad1ccf1163e4aab3b5c9b37ada269fcdd387c5ce7214aa2916f710657dd3
IEDL.DBID	DOA
ISSN	0134-2452
IngestDate	Wed Aug 27 01:31:13 EDT 2025 Thu Apr 24 23:05:58 EDT 2025 Tue Jul 01 03:11:56 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	4
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c357t-9e14bad1ccf1163e4aab3b5c9b37ada269fcdd387c5ce7214aa2916f710657dd3
OpenAccessLink	https://doaj.org/article/af4310751a1d4d4fa3b112a2f21ed5bd
PageCount	13
ParticipantIDs	doaj_primary_oai_doaj_org_article_af4310751a1d4d4fa3b112a2f21ed5bd crossref_citationtrail_10_18287_2412_6179_CO_1277 crossref_primary_10_18287_2412_6179_CO_1277
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2023-08-01
PublicationDateYYYYMMDD	2023-08-01
PublicationDate_xml	– month: 08 year: 2023 text: 2023-08-01 day: 01
PublicationDecade	2020
PublicationTitle	Kompʹûternaâ optika
PublicationYear	2023
Publisher	Samara National Research University
Publisher_xml	– name: Samara National Research University
SSID	ssj0002876141
Score	2.2837248
Snippet	The construction of models for video action classification progresses rapidly. However, the performance of those models can still be easily improved by...
SourceID	doaj crossref
SourceType	Open Website Enrichment Source Index Database
StartPage	637
SubjectTerms	mutual learning optical flow video action classification video labeling video recognition
Title	Mutual modality learning for video action classification
URI	https://doaj.org/article/af4310751a1d4d4fa3b112a2f21ed5bd
Volume	47
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LSwMxEA7iyYtvsb7IwZuEkmyy2Ry1WIpQe7HQW8hTEG1F2__vzG4s9aIXr5vssnyzyXyzmfmGkGuk7Do5sIDygskgPYM9MDHvkhZeucq0Iq7jx3o0lQ8zNdto9YU5YZ08cAdc32VwceDXuONRRpld5YEiOJEFT1H5iLsv-LyNYOql_WUE4bnsmhFWkuHxYqmYQYH3PrgtgcVxhg0mjAutf3ilDfH-1ssM98luoYf0tnutA7KV5odkr1BFWhbi5xFpxiss_KBvi9gSaVq6PzxTIKEUa-sWtCtZoAH5MSYEtTY4JtPh_dNgxEoTBBYqpZfMJC69izyEzIE7Jemcr7wKxlfaRSdqk0OMVaODCgAvh3EBlC8Dc6iVhpETsj1fzNMpoSJ7YFMRlqhuZOOzawJXqQ7ZoKibqXuEf4NgQ1EIx0YVrxYjBQTOInAWgbODiUXgeuRmfc97p4_x6-w7xHY9E7Wt2wtgcVssbv-y-Nl_POSc7GDj-C6V74JsLz9W6RLoxdJftV_SF0iJyUA
linkProvider	Directory of Open Access Journals
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Mutual+modality+learning+for+video+action+classification&rft.jtitle=Komp%CA%B9%C3%BBterna%C3%A2+optika&rft.au=Komkov%2C+S.A.&rft.au=Dzabraev%2C+M.D.&rft.au=Petiushko%2C+A.A.&rft.date=2023-08-01&rft.issn=0134-2452&rft.eissn=2412-6179&rft.volume=47&rft.issue=4&rft.spage=637&rft.epage=649&rft_id=info:doi/10.18287%2F2412-6179-CO-1277&rft.externalDBID=n%2Fa&rft.externalDocID=10_18287_2412_6179_CO_1277
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0134-2452&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0134-2452&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0134-2452&client=summon