Cross-Modal Contrastive Learning Network for Few-Shot Action Recognition
Few-shot action recognition aims to recognize new unseen categories with only a few labeled samples of each class. However, it still suffers from the limitation of inadequate data, which easily leads to the overfitting and low-generalization problems. Therefore, we propose a cross-modal contrastive...
Saved in:
Published in | IEEE transactions on image processing Vol. 33; pp. 1257 - 1271 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
United States
IEEE
2024
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Few-shot action recognition aims to recognize new unseen categories with only a few labeled samples of each class. However, it still suffers from the limitation of inadequate data, which easily leads to the overfitting and low-generalization problems. Therefore, we propose a cross-modal contrastive learning network (CCLN), consisting of an adversarial branch and a contrastive branch, to perform effective few-shot action recognition. In the adversarial branch, we elaborately design a prototypical generative adversarial network (PGAN) to obtain synthesized samples for increasing training samples, which can mitigate the data scarcity problem and thereby alleviate the overfitting problem. When the training samples are limited, the obtained visual features are usually suboptimal for video understanding as they lack discriminative information. To address this issue, in the contrastive branch, we propose a cross-modal contrastive learning module (CCLM) to obtain discriminative feature representations of samples with the help of semantic information, which can enable the network to enhance the feature learning ability at the class-level. Moreover, since videos contain crucial sequences and ordering information, thus we introduce a spatial-temporal enhancement module (SEM) to model the spatial context within video frames and the temporal context across video frames. The experimental results show that the proposed CCLN outperforms the state-of-the-art few-shot action recognition methods on four challenging benchmarks, including Kinetics, UCF101, HMDB51 and SSv2. |
---|---|
AbstractList | Few-shot action recognition aims to recognize new unseen categories with only a few labeled samples of each class. However, it still suffers from the limitation of inadequate data, which easily leads to the overfitting and low-generalization problems. Therefore, we propose a cross-modal contrastive learning network (CCLN), consisting of an adversarial branch and a contrastive branch, to perform effective few-shot action recognition. In the adversarial branch, we elaborately design a prototypical generative adversarial network (PGAN) to obtain synthesized samples for increasing training samples, which can mitigate the data scarcity problem and thereby alleviate the overfitting problem. When the training samples are limited, the obtained visual features are usually suboptimal for video understanding as they lack discriminative information. To address this issue, in the contrastive branch, we propose a cross-modal contrastive learning module (CCLM) to obtain discriminative feature representations of samples with the help of semantic information, which can enable the network to enhance the feature learning ability at the class-level. Moreover, since videos contain crucial sequences and ordering information, thus we introduce a spatial-temporal enhancement module (SEM) to model the spatial context within video frames and the temporal context across video frames. The experimental results show that the proposed CCLN outperforms the state-of-the-art few-shot action recognition methods on four challenging benchmarks, including Kinetics, UCF101, HMDB51 and SSv2. |
Author | Li, Bo Yan, Yan Wang, Hanzi Wang, Xiao Hu, Hai-Miao |
Author_xml | – sequence: 1 givenname: Xiao surname: Wang fullname: Wang, Xiao email: xiaowang@stu.xmu.edu.cn organization: Fujian Key Laboratory of Sensing and Computing for Smart City, School of Informatics, Xiamen University, Xiamen, China – sequence: 2 givenname: Yan orcidid: 0000-0002-3674-7160 surname: Yan fullname: Yan, Yan email: yanyan@xmu.edu.cn organization: Fujian Key Laboratory of Sensing and Computing for Smart City, School of Informatics, Xiamen University, Xiamen, China – sequence: 3 givenname: Hai-Miao orcidid: 0000-0001-6811-9209 surname: Hu fullname: Hu, Hai-Miao email: frank0139@163.com organization: School of Computer Science and Engineering, Beihang University, Beijing, China – sequence: 4 givenname: Bo orcidid: 0000-0001-5980-4861 surname: Li fullname: Li, Bo email: boli@buaa.edu.cn organization: School of Computer Science and Engineering, Beihang University, Beijing, China – sequence: 5 givenname: Hanzi orcidid: 0000-0002-6913-9786 surname: Wang fullname: Wang, Hanzi email: hanzi.wang@xmu.edu.cn organization: Fujian Key Laboratory of Sensing and Computing for Smart City, School of Informatics, Xiamen University, Xiamen, China |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/38252570$$D View this record in MEDLINE/PubMed |
BookMark | eNpdkEtLAzEUhYMovvcuRAbcuJl685okSym-oD7wsR7S9I6OtokmU8V_b4ZWEVf3LL5zuHxbZNUHj4TsURhQCub44fJ2wICJAedSUBArZJMaQUsAwVZzBqlKRYXZIFspvQBQIWm1Tja4ZpJJBZvkYhhDSuVVmNhpMQy-izZ17QcWI7TRt_6puMbuM8TXogmxOMPP8v45dMWJ69rgizt04cm3fd4ha42dJtxd3m3yeHb6MLwoRzfnl8OTUem40F1pgKNTlvEKwLhKamFQ6UoYpmwjx2PjtNWmmVTUaNATKZqGMjV2OTjGreLb5Gix-xbD-xxTV8_a5HA6tR7DPNXM0LxXSejRw3_oS5hHn7_LFKtYpTiYTMGCcr2JiE39FtuZjV81hbq3XGfLdW-5XlrOlYPl8Hw8w8lv4UdrBvYXQIuIf_YEpVoC_wahUIBm |
CODEN | IIPRE4 |
CitedBy_id | crossref_primary_10_1109_OJCS_2024_3406645 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024 |
DBID | 97E RIA RIE NPM AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 |
DOI | 10.1109/TIP.2024.3354104 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) Online IEEE Electronic Library (IEL) PubMed CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic |
DatabaseTitle | PubMed CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional MEDLINE - Academic |
DatabaseTitleList | PubMed Technology Research Database MEDLINE - Academic |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Applied Sciences Engineering |
EISSN | 1941-0042 |
EndPage | 1271 |
ExternalDocumentID | 10_1109_TIP_2024_3354104 38252570 10411850 |
Genre | orig-research Journal Article |
GrantInformation_xml | – fundername: National Natural Science Foundation of China grantid: U21A20514; 62122011; 62372388; 62071404 funderid: 10.13039/501100001809 – fundername: National Key Research and Development Program of China grantid: 2022ZD0160402 funderid: 10.13039/501100012166 – fundername: Fuxiaquan National Independent Innovation Demonstration Zone Collaborative Innovation Platform Project grantid: 3502ZCQXT2022008 |
GroupedDBID | --- -~X .DC 0R~ 29I 4.4 53G 5GY 5VS 6IK 97E AAJGR AASAJ AAYOK ABFSI ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AI. AIBXA AKJIK ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ H~9 ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P RIA RIE RIG RNS TAE TN5 VH1 XFK NPM AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 |
ID | FETCH-LOGICAL-c348t-903ec7a236009c65849e7864927af5bb9c8a89fd619808d54ff127bc54fc23a73 |
IEDL.DBID | RIE |
ISSN | 1057-7149 |
IngestDate | Sat Aug 17 00:24:03 EDT 2024 Fri Sep 13 08:35:57 EDT 2024 Fri Aug 23 03:05:38 EDT 2024 Sat Sep 28 08:08:55 EDT 2024 Wed Jun 26 19:27:46 EDT 2024 |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c348t-903ec7a236009c65849e7864927af5bb9c8a89fd619808d54ff127bc54fc23a73 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ORCID | 0000-0002-3674-7160 0000-0001-5980-4861 0000-0002-6913-9786 0000-0001-6811-9209 |
PMID | 38252570 |
PQID | 2926267309 |
PQPubID | 85429 |
PageCount | 15 |
ParticipantIDs | proquest_miscellaneous_2917866507 proquest_journals_2926267309 crossref_primary_10_1109_TIP_2024_3354104 ieee_primary_10411850 pubmed_primary_38252570 |
PublicationCentury | 2000 |
PublicationDate | 20240000 2024-00-00 20240101 |
PublicationDateYYYYMMDD | 2024-01-01 |
PublicationDate_xml | – year: 2024 text: 20240000 |
PublicationDecade | 2020 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States – name: New York |
PublicationTitle | IEEE transactions on image processing |
PublicationTitleAbbrev | TIP |
PublicationTitleAlternate | IEEE Trans Image Process |
PublicationYear | 2024 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
SSID | ssj0014516 |
Score | 2.4839685 |
Snippet | Few-shot action recognition aims to recognize new unseen categories with only a few labeled samples of each class. However, it still suffers from the... |
SourceID | proquest crossref pubmed ieee |
SourceType | Aggregation Database Index Database Publisher |
StartPage | 1257 |
SubjectTerms | action recognition Activity recognition Context contrastive learning Feature extraction Few-shot learning Frames (data processing) Generative adversarial networks Image recognition Machine learning meta-learning Modules Self-supervised learning Semantics Task analysis Three-dimensional displays video understanding Visual discrimination Visualization |
Title | Cross-Modal Contrastive Learning Network for Few-Shot Action Recognition |
URI | https://ieeexplore.ieee.org/document/10411850 https://www.ncbi.nlm.nih.gov/pubmed/38252570 https://www.proquest.com/docview/2926267309/abstract/ https://search.proquest.com/docview/2917866507 |
Volume | 33 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1La9wwEB7aHEJ6yKtp6zYJKuSSg7ZeS7akYwhZNoEspUkgNyNLcgMFu2S9BPrrOyPbS1II9GAQWJZlzYz1jeYFcGKraeEl-fl5kXOpC88N6hlcOe897qBGaoodvl4U8zt5dZ_fD8HqMRYmhBCdz8KEmtGW71u3oqMylHCJeJg09Lc6zfpgrbXJgCrORtNmrrhC3D_aJFPz7fbyO2qCmZwIkUscZAs2BWpGVMDtxXYU66u8DjXjljPbgcU42d7T5Ndk1VUT9-efPI7__TW7sD2AT3bWc8sevAnNPuwMQJQNYr7ch3fPshS-h_k5TZ1ftx6fpWxWj3ZJP0k25Gb9yRa9LzlDAMxm4YnfPLQdO4sRE-zH6KHUNgdwN7u4PZ_zoQADd0LqjptUBKdsJhAVGUdYxQSlC2kyZeu8qozTVpsaaWp0qn0u63qaqcphw2XCKvEBNpq2CZ-AmamotLXS1dZJ5AsTah3ySmU1Yka8Ejgd6VD-7vNslFE_SU2J5CuJfOVAvgQOaDWf9esXMoHDkXLlIInLMqOEiAX-x0wCX9e3UYbIMGKb0K6oz1RR3r9UJfCxp_h68JFRPr_y0i-wRXPrT2UOYaN7XIUjxClddRz58y-EVd-r |
link.rule.ids | 315,786,790,802,4043,27956,27957,27958,55109 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3_a9QwFH_IBJ0_bDqn1k2N4C_-kFuvSZvkxzE8bro7RG-w30qapApCK7segn_93kvbYw4G_lAINE3TvPeaz8v7BvDBVtPCS_Lz8yLnUheeG9QzuHLee9xBjdQUO7xYFvNL-fkqvxqC1WMsTAghOp-FCTWjLd-3bkNHZSjhEvEwaegPcaNPTR-utTUaUM3ZaNzMFVeI_EerZGpOVudfURfM5ESIXOIwu_BIoG5EJdz-2ZBihZX7wWbcdGb7sByn2_ua_Jpsumri_t7J5Pjf3_MU9gb4yU57fnkGD0JzAPsDFGWDoK8P4MmtPIXPYX5GU-eL1uOzlM_q2q7pN8mG7Kw_2LL3JmcIgdks_OHff7YdO40xE-zb6KPUNodwOfu0OpvzoQQDd0LqjptUBKdsJhAXGUdoxQSlC2kyZeu8qozTVpsaqWp0qn0u63qaqcphw2XCKvECdpq2Ca-AmamotLXS1dZJ5AwTah3ySmU1oka8Evg40qH83WfaKKOGkpoSyVcS-cqBfAkc0mre6tcvZALHI-XKQRbXZUYpEQv8k5kE3m9voxSRacQ2od1Qn6mizH-pSuBlT_Ht4COjvL7npe_g8Xy1uCgvzpdfjmCX5tmf0RzDTne9CW8QtXTV28irN36o4wE |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Cross-modal+Contrastive+Learning+Network+for+Few-Shot+Action+Recognition&rft.jtitle=IEEE+transactions+on+image+processing&rft.au=Wang%2C+Xiao&rft.au=Yan%2C+Yan&rft.au=Hu%2C+Hai-Miao&rft.au=Li%2C+Bo&rft.date=2024&rft.issn=1057-7149&rft.eissn=1941-0042&rft.spage=1&rft.epage=1&rft_id=info:doi/10.1109%2FTIP.2024.3354104&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TIP_2024_3354104 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1057-7149&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1057-7149&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1057-7149&client=summon |