Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos
Recent years have witnessed the popularity of using recurrent neural network (RNN) for action recognition in videos. However, videos are of high dimensionality and contain rich human dynamics with various motion scales, which makes the traditional RNNs difficult to capture complex action information...
Saved in:
Published in | IEEE transactions on image processing Vol. 27; no. 3; pp. 1347 - 1360 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
United States
IEEE
01.03.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Recent years have witnessed the popularity of using recurrent neural network (RNN) for action recognition in videos. However, videos are of high dimensionality and contain rich human dynamics with various motion scales, which makes the traditional RNNs difficult to capture complex action information. In this paper, we propose a novel recurrent spatial-temporal attention network (RSTAN) to address this challenge, where we introduce a spatial-temporal attention mechanism to adaptively identify key features from the global video context for every time-step prediction of RNN. More specifically, we make three main contributions from the following aspects. First, we reinforce the classical long short-term memory (LSTM) with a novel spatial-temporal attention module. At each time step, our module can automatically learn a spatial-temporal action representation from all sampled video frames, which is compact and highly relevant to the prediction at the current step. Second, we design an attention-driven appearance-motion fusion strategy to integrate appearance and motion LSTMs into a unified framework, where LSTMs with their spatial-temporal attention modules in two streams can be jointly trained in an end-to-end fashion. Third, we develop actor-attention regularization for RSTAN, which can guide our attention mechanism to focus on the important action regions around actors. We evaluate the proposed RSTAN on the benchmark UCF101, HMDB51 and JHMDB data sets. The experimental results show that, our RSTAN outperforms other recent RNN-based approaches on UCF101 and HMDB51 as well as achieves the state-of-the-art on JHMDB. |
---|---|
AbstractList | Recent years have witnessed the popularity of using recurrent neural network (RNN) for action recognition in videos. However, videos are of high dimensionality and contain rich human dynamics with various motion scales, which makes the traditional RNNs difficult to capture complex action information. In this paper, we propose a novel recurrent spatial-temporal attention network (RSTAN) to address this challenge, where we introduce a spatial-temporal attention mechanism to adaptively identify key features from the global video context for every time-step prediction of RNN. More specifically, we make three main contributions from the following aspects. First, we reinforce the classical long short-term memory (LSTM) with a novel spatial-temporal attention module. At each time step, our module can automatically learn a spatial-temporal action representation from all sampled video frames, which is compact and highly relevant to the prediction at the current step. Second, we design an attention-driven appearance-motion fusion strategy to integrate appearance and motion LSTMs into a unified framework, where LSTMs with their spatial-temporal attention modules in two streams can be jointly trained in an end-to-end fashion. Third, we develop actor-attention regularization for RSTAN, which can guide our attention mechanism to focus on the important action regions around actors. We evaluate the proposed RSTAN on the benchmark UCF101, HMDB51 and JHMDB data sets. The experimental results show that, our RSTAN outperforms other recent RNN-based approaches on UCF101 and HMDB51 as well as achieves the state-of-the-art on JHMDB. Recent years have witnessed the popularity of using recurrent neural network (RNN) for action recognition in videos. However, videos are of high dimensionality and contain rich human dynamics with various motion scales, which makes the traditional RNNs difficult to capture complex action information. In this paper, we propose a novel recurrent spatial-temporal attention network (RSTAN) to address this challenge, where we introduce a spatial-temporal attention mechanism to adaptively identify key features from the global video context for every time-step prediction of RNN. More specifically, we make three main contributions from the following aspects. First, we reinforce the classical long short-term memory (LSTM) with a novel spatial-temporal attention module. At each time step, our module can automatically learn a spatial-temporal action representation from all sampled video frames, which is compact and highly relevant to the prediction at the current step. Second, we design an attention-driven appearance-motion fusion strategy to integrate appearance and motion LSTMs into a unified framework, where LSTMs with their spatial-temporal attention modules in two streams can be jointly trained in an end-to-end fashion. Third, we develop actor-attention regularization for RSTAN, which can guide our attention mechanism to focus on the important action regions around actors. We evaluate the proposed RSTAN on the benchmark UCF101, HMDB51 and JHMDB data sets. The experimental results show that, our RSTAN outperforms other recent RNN-based approaches on UCF101 and HMDB51 as well as achieves the state-of-the-art on JHMDB.Recent years have witnessed the popularity of using recurrent neural network (RNN) for action recognition in videos. However, videos are of high dimensionality and contain rich human dynamics with various motion scales, which makes the traditional RNNs difficult to capture complex action information. In this paper, we propose a novel recurrent spatial-temporal attention network (RSTAN) to address this challenge, where we introduce a spatial-temporal attention mechanism to adaptively identify key features from the global video context for every time-step prediction of RNN. More specifically, we make three main contributions from the following aspects. First, we reinforce the classical long short-term memory (LSTM) with a novel spatial-temporal attention module. At each time step, our module can automatically learn a spatial-temporal action representation from all sampled video frames, which is compact and highly relevant to the prediction at the current step. Second, we design an attention-driven appearance-motion fusion strategy to integrate appearance and motion LSTMs into a unified framework, where LSTMs with their spatial-temporal attention modules in two streams can be jointly trained in an end-to-end fashion. Third, we develop actor-attention regularization for RSTAN, which can guide our attention mechanism to focus on the important action regions around actors. We evaluate the proposed RSTAN on the benchmark UCF101, HMDB51 and JHMDB data sets. The experimental results show that, our RSTAN outperforms other recent RNN-based approaches on UCF101 and HMDB51 as well as achieves the state-of-the-art on JHMDB. |
Author | Wenbin Du Yu Qiao Yali Wang |
Author_xml | – sequence: 1 givenname: Wenbin surname: Du fullname: Du, Wenbin – sequence: 2 givenname: Yali surname: Wang fullname: Wang, Yali – sequence: 3 givenname: Yu orcidid: 0000-0002-1889-2567 surname: Qiao fullname: Qiao, Yu |
BackLink | https://www.ncbi.nlm.nih.gov/pubmed/29990061$$D View this record in MEDLINE/PubMed |
BookMark | eNp9kc1vEzEQxS3Uin7AHQkJrcSFy6YzttdeH6OKj0qlRRC4WsaZRS6bdbC9Qvz3OE3ooQfsg0ej35ux3jtjR1OciLEXCAtEMBerq08LDqgXXOu-U-IJO0UjsQWQ_KjW0OlWozQn7CznOwCUHaqn7IQbYwAUnrKPn8nPKdFUmi9bV4Ib2xVttjG5sVmWUvshTs0Nld8x_WyGmJqlv29VXfwxhfs6TM23sKaYn7HjwY2Znh_ec_b13dvV5Yf2-vb91eXyuvUSRWl7RXIAZwxpwIHqUV52mpxYa-cGM_TCe-807xQg1yCF8p3jNPQctNBCnLM3-7nbFH_NlIvdhOxpHN1Ecc6Wg-qFlHVZRV8_Qu_inKb6O4umByNBaqzUqwM1f9_Q2m5T2Lj0x_4zqgKwB3yKOScaHhAEu8vC1izsLgt7yKJK1COJD8XtDCvJhfF_wpd7YajOPOzpkQtT71_95ZS9 |
CODEN | IIPRE4 |
CitedBy_id | crossref_primary_10_1007_s10489_024_06001_z crossref_primary_10_3390_s22176554 crossref_primary_10_1109_TIP_2021_3100556 crossref_primary_10_1007_s11042_022_12219_1 crossref_primary_10_1109_TNSRE_2020_3039297 crossref_primary_10_1016_j_ins_2022_05_092 crossref_primary_10_1007_s10845_023_02318_7 crossref_primary_10_1016_j_neucom_2024_127975 crossref_primary_10_1007_s12517_021_09229_y crossref_primary_10_1016_j_eswa_2022_117730 crossref_primary_10_1109_TNNLS_2020_2978613 crossref_primary_10_1007_s00382_024_07348_2 crossref_primary_10_3233_IA_190021 crossref_primary_10_3390_s21175921 crossref_primary_10_1016_j_neucom_2024_127973 crossref_primary_10_1109_TSG_2022_3166600 crossref_primary_10_1155_2021_6690606 crossref_primary_10_1109_TIP_2023_3270105 crossref_primary_10_1109_TMM_2021_3050058 crossref_primary_10_1007_s00521_019_04516_y crossref_primary_10_1016_j_imavis_2019_08_009 crossref_primary_10_3390_s22134863 crossref_primary_10_1016_j_ins_2022_11_012 crossref_primary_10_1007_s00371_023_02988_7 crossref_primary_10_1038_s41538_025_00376_0 crossref_primary_10_1109_TMM_2022_3148588 crossref_primary_10_1007_s00138_018_0956_5 crossref_primary_10_1109_TSTE_2019_2897136 crossref_primary_10_1088_1361_6501_ad3ea6 crossref_primary_10_1109_TIP_2021_3113570 crossref_primary_10_1016_j_neucom_2021_03_120 crossref_primary_10_1007_s11220_022_00399_x crossref_primary_10_1109_TIM_2022_3204100 crossref_primary_10_1016_j_eswa_2022_118484 crossref_primary_10_1109_TIM_2023_3320734 crossref_primary_10_1007_s41095_022_0271_y crossref_primary_10_3390_s21144720 crossref_primary_10_3390_rs15030842 crossref_primary_10_1109_TIP_2021_3058599 crossref_primary_10_1109_TIP_2020_2987425 crossref_primary_10_1016_j_patcog_2023_110071 crossref_primary_10_1007_s11042_022_13068_8 crossref_primary_10_1016_j_knosys_2022_108786 crossref_primary_10_1016_j_semcancer_2022_08_005 crossref_primary_10_1109_JSEN_2020_3019258 crossref_primary_10_1109_TNNLS_2021_3105184 crossref_primary_10_3390_rs16132427 crossref_primary_10_1016_j_dsp_2022_103449 crossref_primary_10_1007_s11042_023_15022_8 crossref_primary_10_1155_2021_3155357 crossref_primary_10_1109_TITS_2022_3208952 crossref_primary_10_1109_ACCESS_2022_3206449 crossref_primary_10_1109_TMM_2018_2862341 crossref_primary_10_1371_journal_pone_0244647 crossref_primary_10_1109_TMM_2020_3011317 crossref_primary_10_1007_s10470_018_1306_2 crossref_primary_10_1007_s11042_023_14355_8 crossref_primary_10_1109_TIP_2020_2989864 crossref_primary_10_1109_TIM_2023_3302372 crossref_primary_10_1016_j_neucom_2020_06_032 crossref_primary_10_1109_TCSVT_2019_2958188 crossref_primary_10_1016_j_cie_2022_108559 crossref_primary_10_1016_j_neucom_2020_12_020 crossref_primary_10_1007_s00521_023_08559_0 crossref_primary_10_1016_j_patcog_2019_03_010 crossref_primary_10_1109_JSEN_2022_3143705 crossref_primary_10_1002_int_22591 crossref_primary_10_1109_TVT_2023_3313593 crossref_primary_10_1109_ACCESS_2021_3132787 crossref_primary_10_3390_electronics12173729 crossref_primary_10_1016_j_knosys_2024_112480 crossref_primary_10_3390_s25020497 crossref_primary_10_1109_TMM_2019_2953814 crossref_primary_10_1109_TMM_2021_3056892 crossref_primary_10_1016_j_egyr_2024_11_057 crossref_primary_10_1007_s11063_020_10248_1 crossref_primary_10_1109_TMM_2024_3386339 crossref_primary_10_12677_mos_2025_141002 crossref_primary_10_1371_journal_pone_0265115 crossref_primary_10_1002_hbm_26399 crossref_primary_10_1016_j_phycom_2021_101584 crossref_primary_10_3390_bioengineering11121180 crossref_primary_10_1109_TITS_2019_2942096 crossref_primary_10_1109_TASE_2021_3077689 crossref_primary_10_1016_j_neucom_2022_10_037 crossref_primary_10_1109_ACCESS_2019_2906370 crossref_primary_10_1109_TCPMT_2023_3282616 crossref_primary_10_1109_JIOT_2021_3081694 crossref_primary_10_3233_JIFS_220230 crossref_primary_10_1049_ipr2_12152 crossref_primary_10_1007_s13042_023_01774_0 crossref_primary_10_1016_j_artmed_2024_102777 crossref_primary_10_1007_s11571_020_09626_1 crossref_primary_10_1109_TMM_2021_3058050 crossref_primary_10_1016_j_jvcir_2023_103804 crossref_primary_10_1109_ACCESS_2020_2979549 crossref_primary_10_1007_s40747_021_00606_4 crossref_primary_10_1007_s11269_025_04166_x crossref_primary_10_1109_ACCESS_2020_3003939 crossref_primary_10_1109_TIP_2022_3205210 crossref_primary_10_3389_fpls_2020_601250 crossref_primary_10_1109_TETCI_2024_3518613 crossref_primary_10_1007_s11042_023_17345_y crossref_primary_10_1109_TMM_2023_3326289 crossref_primary_10_1007_s00530_022_00961_3 crossref_primary_10_1016_j_eswa_2022_118791 crossref_primary_10_3390_technologies13020053 crossref_primary_10_1016_j_neucom_2021_06_088 crossref_primary_10_2139_ssrn_4193374 crossref_primary_10_1080_08839514_2020_1765110 crossref_primary_10_1016_j_cviu_2019_102794 crossref_primary_10_1109_TIE_2020_2972443 crossref_primary_10_1109_TIP_2019_2917283 crossref_primary_10_1109_TIP_2019_2914577 crossref_primary_10_1109_TIP_2019_2901707 crossref_primary_10_1155_2022_4667640 crossref_primary_10_1007_s11063_020_10211_0 crossref_primary_10_1016_j_jvcir_2019_102650 crossref_primary_10_1016_j_inffus_2023_101949 |
Cites_doi | 10.1007/s11263-005-1838-7 10.1016/j.imavis.2009.11.014 10.1109/ICCV.2013.396 10.1109/CVPR.2014.223 10.1109/WACV.2017.26 10.1145/2733373.2806222 10.1109/ICCV.2017.316 10.1109/TPAMI.2012.59 10.1109/ICCV.2015.368 10.1109/CVPR.2016.219 10.1109/CVPRW.2017.161 10.1109/TCYB.2016.2519448 10.1109/ICCV.2015.522 10.1007/978-3-319-46493-0_45 10.1109/CVPR.2015.7298594 10.5244/C.22.99 10.1109/ICCV.2013.441 10.1109/CVPR.2005.177 10.1109/CVPR.2015.7299059 10.21236/ADA623249 10.1109/ICCV.2015.222 10.1109/TMM.2016.2626959 10.1109/CVPR.2017.337 10.1109/ICCV.2015.512 10.1109/TMM.2017.2666540 10.1109/ICCV.2011.6126543 10.1145/1291233.1291311 10.1007/978-3-540-88688-4_48 10.1162/neco.1997.9.8.1735 10.1109/CVPR.2017.172 10.1109/ICCV.2015.510 10.1109/CVPR.2016.213 10.1109/CVPR.2015.7298676 10.1109/CVPR.2016.331 10.1109/CVPR.2015.7298935 10.1016/j.sigpro.2015.10.035 10.1109/WACV.2017.22 10.1109/CVPR.2011.5995407 10.1109/WACV.2017.27 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018 |
DBID | 97E RIA RIE AAYXX CITATION NPM 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 |
DOI | 10.1109/TIP.2017.2778563 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Electronic Library (IEL) CrossRef PubMed Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic |
DatabaseTitle | CrossRef PubMed Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional MEDLINE - Academic |
DatabaseTitleList | PubMed MEDLINE - Academic Technology Research Database |
Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Applied Sciences Engineering |
EISSN | 1941-0042 |
EndPage | 1360 |
ExternalDocumentID | 29990061 10_1109_TIP_2017_2778563 8123939 |
Genre | orig-research Journal Article |
GrantInformation_xml | – fundername: External Cooperation Program of BIC Chinese Academy of Sciences grantid: 172644KYSB20150019; 172644KYSB20160033 funderid: 10.13039/501100002367 – fundername: Shenzhen Basic Research Program grantid: JCYJ20150925163005055; JCYJ20160229193541167 – fundername: National Natural Science Foundation of China grantid: U1613211; 61633021; 61502470 funderid: 10.13039/501100001809 |
GroupedDBID | --- -~X .DC 0R~ 29I 4.4 53G 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABFSI ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ H~9 ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 MS~ O9- OCL P2P RIA RIE RNS TAE TN5 VH1 AAYOK AAYXX CITATION RIG NPM 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 |
ID | FETCH-LOGICAL-c413t-86e4f0a99e701feeee6c457ea3d7aaf9f83ccca725601270436c5a2ef82073733 |
IEDL.DBID | RIE |
ISSN | 1057-7149 1941-0042 |
IngestDate | Fri Jul 11 07:29:54 EDT 2025 Sun Jun 29 16:07:01 EDT 2025 Thu Apr 03 07:02:04 EDT 2025 Tue Jul 01 02:03:16 EDT 2025 Thu Apr 24 22:54:41 EDT 2025 Wed Aug 27 02:52:25 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 3 |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c413t-86e4f0a99e701feeee6c457ea3d7aaf9f83ccca725601270436c5a2ef82073733 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
ORCID | 0000-0002-1889-2567 |
PMID | 29990061 |
PQID | 1980940471 |
PQPubID | 85429 |
PageCount | 14 |
ParticipantIDs | proquest_miscellaneous_2068344413 pubmed_primary_29990061 proquest_journals_1980940471 crossref_citationtrail_10_1109_TIP_2017_2778563 crossref_primary_10_1109_TIP_2017_2778563 ieee_primary_8123939 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2018-03-01 |
PublicationDateYYYYMMDD | 2018-03-01 |
PublicationDate_xml | – month: 03 year: 2018 text: 2018-03-01 day: 01 |
PublicationDecade | 2010 |
PublicationPlace | United States |
PublicationPlace_xml | – name: United States – name: New York |
PublicationTitle | IEEE transactions on image processing |
PublicationTitleAbbrev | TIP |
PublicationTitleAlternate | IEEE Trans Image Process |
PublicationYear | 2018 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 varol (ref67) 2016 yeung (ref12) 2015 ref15 peng (ref20) 2014 ref58 ref14 ref53 soomro (ref60) 2012 ballas (ref52) 2016 ref10 wang (ref38) 2017 tompson (ref57) 2014 ref17 ref16 diba (ref66) 2016 ref19 ref18 wang (ref55) 2016 srivastava (ref6) 2015 kar (ref36) 2016 ref51 diba (ref43) 2016 ref50 bahdanau (ref46) 2014 wang (ref42) 2016 ref48 xu (ref49) 2015 zhu (ref69) 2017 qiu (ref68) 2016 ref8 ref7 ref4 peng (ref59) 2014 ref3 simonyan (ref9) 2014 feichtenhofer (ref44) 2016 team (ref63) 2016 ref40 ref35 ref34 ref37 ref31 ref30 ref33 ref32 sharma (ref54) 2015 ref2 ref1 ref39 ma (ref45) 2017 ref71 ref70 krizhevsky (ref26) 2012 ref72 ref24 ng (ref5) 2015 ref23 cherian (ref41) 2017 simonyan (ref27) 2014 ref22 ref65 ref21 ref28 wang (ref29) 2015 bazzani (ref56) 2017 li (ref11) 2016 graves (ref47) 2014 ref62 ref61 he (ref25) 2015 ding (ref64) 2014 |
References_xml | – ident: ref14 doi: 10.1007/s11263-005-1838-7 – year: 2014 ident: ref64 publication-title: Theano-based large-scale visual recognition with multiple gpus – ident: ref1 doi: 10.1016/j.imavis.2009.11.014 – start-page: 2048 year: 2015 ident: ref49 article-title: Show, attend and tell: Neural image caption generation with visual attention publication-title: Proc ICML – ident: ref62 doi: 10.1109/ICCV.2013.396 – start-page: 1799 year: 2014 ident: ref57 article-title: Joint training of a convolutional network and a graphical model for human pose estimation publication-title: Proc NIPS – year: 2017 ident: ref41 publication-title: Second-order temporal pooling for action recognition – ident: ref4 doi: 10.1109/CVPR.2014.223 – year: 2015 ident: ref54 publication-title: Action recognition using visual attention – start-page: 843 year: 2015 ident: ref6 article-title: Unsupervised learning of video representations using LSTMs publication-title: Proc ICML – year: 2014 ident: ref20 publication-title: Bag of visual words and fusion methods for action recognition Comprehensive study and good practice – year: 2016 ident: ref55 publication-title: Hierarchical attention network for action recognition in videos – ident: ref37 doi: 10.1109/WACV.2017.26 – start-page: 1097 year: 2012 ident: ref26 article-title: ImageNet classification with deep convolutional neural networks publication-title: Proc NIPS – ident: ref51 doi: 10.1145/2733373.2806222 – year: 2016 ident: ref67 publication-title: Long-term temporal convolutions for action recognition – ident: ref35 doi: 10.1109/ICCV.2017.316 – year: 2016 ident: ref11 publication-title: Videolstm convolves attends and flows for action recognition – ident: ref3 doi: 10.1109/TPAMI.2012.59 – ident: ref71 doi: 10.1109/ICCV.2015.368 – ident: ref33 doi: 10.1109/CVPR.2016.219 – start-page: 1 year: 2016 ident: ref52 article-title: Delving deeper into convolutional networks for learning video representations publication-title: Proc ICLR – ident: ref31 doi: 10.1109/CVPRW.2017.161 – ident: ref23 doi: 10.1109/TCYB.2016.2519448 – ident: ref7 doi: 10.1109/ICCV.2015.522 – ident: ref72 doi: 10.1007/978-3-319-46493-0_45 – ident: ref28 doi: 10.1109/CVPR.2015.7298594 – year: 2015 ident: ref12 publication-title: Every moment counts Dense detailed labeling of actions in complex videos – ident: ref13 doi: 10.5244/C.22.99 – start-page: 1 year: 2017 ident: ref56 article-title: Recurrent mixture density network for spatiotemporal visual attention publication-title: Proc ICLR – ident: ref17 doi: 10.1109/ICCV.2013.441 – ident: ref19 doi: 10.1109/CVPR.2005.177 – ident: ref22 doi: 10.1109/CVPR.2015.7299059 – year: 2017 ident: ref69 publication-title: Hidden two-stream convolutional networks for action recognition – ident: ref2 doi: 10.21236/ADA623249 – ident: ref58 doi: 10.1109/ICCV.2015.222 – year: 2015 ident: ref5 publication-title: Beyond short snippets Deep networks for video classification – year: 2016 ident: ref66 publication-title: Efficient two-stream motion and appearance 3D CNNs for video classification – ident: ref24 doi: 10.1109/TMM.2016.2626959 – ident: ref32 doi: 10.1109/CVPR.2017.337 – ident: ref50 doi: 10.1109/ICCV.2015.512 – year: 2016 ident: ref43 publication-title: Deep Temporal Linear Encoding Networks – ident: ref30 doi: 10.1109/TMM.2017.2666540 – year: 2015 ident: ref29 publication-title: Towards good practices for very deep two-stream convnets – ident: ref61 doi: 10.1109/ICCV.2011.6126543 – year: 2014 ident: ref27 publication-title: Very Deep Convolutional Networks for Large-scale Image Recognition – year: 2017 ident: ref38 publication-title: Action representation using classifier decision boundaries – year: 2014 ident: ref46 publication-title: Neural machine translation by jointly learning to align and translate – year: 2016 ident: ref36 publication-title: AdaScan Adaptive scan pooling in deep convolutional neural networks for human action recognition in videos – start-page: 20 year: 2016 ident: ref42 article-title: Temporal segment networks: Towards good practices for deep action recognition publication-title: Proc ECCV – ident: ref15 doi: 10.1145/1291233.1291311 – ident: ref18 doi: 10.1007/978-3-540-88688-4_48 – ident: ref10 doi: 10.1162/neco.1997.9.8.1735 – ident: ref40 doi: 10.1109/CVPR.2017.172 – ident: ref8 doi: 10.1109/ICCV.2015.510 – ident: ref34 doi: 10.1109/CVPR.2016.213 – start-page: 581 year: 2014 ident: ref59 article-title: Action recognition with stacked fisher vectors publication-title: Proc ECCV – year: 2012 ident: ref60 publication-title: Ucf101 A Dataset of 101 Human Actions Classes from Videos in the Wild – ident: ref70 doi: 10.1109/CVPR.2015.7298676 – ident: ref65 doi: 10.1109/CVPR.2016.331 – year: 2016 ident: ref68 publication-title: Deep Quantization Encoding Convolutional Activations with Deep Generative Model – ident: ref48 doi: 10.1109/CVPR.2015.7298935 – year: 2014 ident: ref9 publication-title: Two-stream convolutional networks for action recognition in videos – ident: ref21 doi: 10.1016/j.sigpro.2015.10.035 – year: 2017 ident: ref45 publication-title: TS-LSTM and temporal-inception Exploiting spatiotemporal dynamics for activity recognition – start-page: 1764 year: 2014 ident: ref47 article-title: Towards end-to-end speech recognition with recurrent neural networks publication-title: Proc ICML – ident: ref39 doi: 10.1109/WACV.2017.22 – ident: ref16 doi: 10.1109/CVPR.2011.5995407 – year: 2015 ident: ref25 publication-title: Deep residual learning for image recognition – ident: ref53 doi: 10.1109/WACV.2017.27 – start-page: 3468 year: 2016 ident: ref44 article-title: Spatiotemporal residual networks for video action recognition publication-title: Proc NIPS – year: 2016 ident: ref63 publication-title: Theano A Python framework for fast computation of mathematical expressions |
SSID | ssj0014516 |
Score | 2.6062515 |
Snippet | Recent years have witnessed the popularity of using recurrent neural network (RNN) for action recognition in videos. However, videos are of high dimensionality... |
SourceID | proquest pubmed crossref ieee |
SourceType | Aggregation Database Index Database Enrichment Source Publisher |
StartPage | 1347 |
SubjectTerms | Action recognition actor-attention regularization attention-driven fusion Computer vision Feature extraction Human motion Image recognition Moving object recognition Neural networks Optical imaging Recognition Recurrent neural networks Regularization RSTAN spatial-temporal attention Three-dimensional displays Videos |
Title | Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos |
URI | https://ieeexplore.ieee.org/document/8123939 https://www.ncbi.nlm.nih.gov/pubmed/29990061 https://www.proquest.com/docview/1980940471 https://www.proquest.com/docview/2068344413 |
Volume | 27 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3BTtwwEB0BJ3ooFNoSoMhIvVRqdrNxHNtHVBXRSosqtFTcIscZS6hVFrHZC1_P2E4iqNqKW6Q4ju2Z0Tx7xm8APsoCOUF7TGdWiLSotSCbEyrFXOaNqstahWIw88vy4rr4fiNuNuDzeBcGEUPyGU78Y4jlN0u79kdlU3JGXHO9CZu0cYt3tcaIgS84GyKbQqaSYP8Qksz0dPHth8_hkpNcSiVK_swFhZoq_4aXwc2c78B8GGDMLvk1WXf1xD78wd340hnswuseb7KzqCBvYAPbPdjpsSfrLXu1B6-eEBPuw_zKH8N74ibmaxaTjqaLyGFFXXVdTJFklzGFnBHuZWfhfgS7GvKR6Pm2ZT9vG1yu3sL1-dfFl4u0L7yQWvJpXapKLFxmtEaZzRxNAktbCImGN9IYp53iliQvw3Yul57F3gqToyM4Ibnk_B1stcsWD4DpPHcz44xtGsI-ymqjuMPcuqx0Dp1IYDrIorI9K7kvjvG7CruTTFckvcpLr-qll8Cn8Yu7yMjxn7b7XgZju375EzgexF31JruqZlp5LkFy1gmcjq_J2HwExbS4XK-o79LXJaFFSuB9VJOxb_Lr2gPCw7__8wi2aWQqpq8dw1Z3v8YPhGe6-iQo8iPCKvBv |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3BbtQwEB2VcgAOFFoogQJB4oJEdrN2HNvHClFtobtC1Rb1FjnOWKqosojNXvr1HdtJBAgQt0hxHNszo3n2jN8AvJUFcoL2mM2sEFlRa0E2J1SGTLJG1WWtQjGYxbKcXxSfLsXlDrwf78IgYkg-w4l_DLH8Zm23_qhsSs6Ia67vwF3y-4LF21pjzMCXnA2xTSEzScB_CErmero6_eKzuOSESalEyX9xQqGqyt8BZnA0J3uwGIYY80u-TbZdPbE3v7E3_u8cHsHDHnGmx1FFHsMOtvuw16PPtLftzT48-Ima8AAW5_4g3lM3pb5qMWlptoosVtRV18UkyXQZk8hTQr7pcbghkZ4PGUn0fNWmX68aXG-ewMXJx9WHedaXXsgsebUuUyUWLjdao8xnjiaBpS2ERMMbaYzTTnFLspdhQ8ek57G3wjB0BCgkl5w_hd123eIzSDVjbmacsU1D6EdZbRR3yKzLS-fQiQSmgywq2_OS-_IY11XYn-S6IulVXnpVL70E3o1ffI-cHP9oe-BlMLbrlz-Bo0HcVW-0m2qmlWcTJHedwJvxNZmbj6GYFtfbDfVd-soktEgJHEY1Gfsmz649JHz-53--hnvz1eKsOjtdfn4B92mUKiazHcFu92OLLwnddPWroNS3c4bzuQ |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Recurrent+Spatial-Temporal+Attention+Network+for+Action+Recognition+in+Videos&rft.jtitle=IEEE+transactions+on+image+processing&rft.au=Du%2C+Wenbin&rft.au=Wang%2C+Yali&rft.au=Qiao%2C+Yu&rft.date=2018-03-01&rft.issn=1057-7149&rft.eissn=1941-0042&rft.volume=27&rft.issue=3&rft.spage=1347&rft.epage=1360&rft_id=info:doi/10.1109%2FTIP.2017.2778563&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TIP_2017_2778563 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1057-7149&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1057-7149&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1057-7149&client=summon |