Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos

Recent years have witnessed the popularity of using recurrent neural network (RNN) for action recognition in videos. However, videos are of high dimensionality and contain rich human dynamics with various motion scales, which makes the traditional RNNs difficult to capture complex action information...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on image processing Vol. 27; no. 3; pp. 1347 - 1360
Main Authors Du, Wenbin, Wang, Yali, Qiao, Yu
Format Journal Article
LanguageEnglish
Published United States IEEE 01.03.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Recent years have witnessed the popularity of using recurrent neural network (RNN) for action recognition in videos. However, videos are of high dimensionality and contain rich human dynamics with various motion scales, which makes the traditional RNNs difficult to capture complex action information. In this paper, we propose a novel recurrent spatial-temporal attention network (RSTAN) to address this challenge, where we introduce a spatial-temporal attention mechanism to adaptively identify key features from the global video context for every time-step prediction of RNN. More specifically, we make three main contributions from the following aspects. First, we reinforce the classical long short-term memory (LSTM) with a novel spatial-temporal attention module. At each time step, our module can automatically learn a spatial-temporal action representation from all sampled video frames, which is compact and highly relevant to the prediction at the current step. Second, we design an attention-driven appearance-motion fusion strategy to integrate appearance and motion LSTMs into a unified framework, where LSTMs with their spatial-temporal attention modules in two streams can be jointly trained in an end-to-end fashion. Third, we develop actor-attention regularization for RSTAN, which can guide our attention mechanism to focus on the important action regions around actors. We evaluate the proposed RSTAN on the benchmark UCF101, HMDB51 and JHMDB data sets. The experimental results show that, our RSTAN outperforms other recent RNN-based approaches on UCF101 and HMDB51 as well as achieves the state-of-the-art on JHMDB.
AbstractList Recent years have witnessed the popularity of using recurrent neural network (RNN) for action recognition in videos. However, videos are of high dimensionality and contain rich human dynamics with various motion scales, which makes the traditional RNNs difficult to capture complex action information. In this paper, we propose a novel recurrent spatial-temporal attention network (RSTAN) to address this challenge, where we introduce a spatial-temporal attention mechanism to adaptively identify key features from the global video context for every time-step prediction of RNN. More specifically, we make three main contributions from the following aspects. First, we reinforce the classical long short-term memory (LSTM) with a novel spatial-temporal attention module. At each time step, our module can automatically learn a spatial-temporal action representation from all sampled video frames, which is compact and highly relevant to the prediction at the current step. Second, we design an attention-driven appearance-motion fusion strategy to integrate appearance and motion LSTMs into a unified framework, where LSTMs with their spatial-temporal attention modules in two streams can be jointly trained in an end-to-end fashion. Third, we develop actor-attention regularization for RSTAN, which can guide our attention mechanism to focus on the important action regions around actors. We evaluate the proposed RSTAN on the benchmark UCF101, HMDB51 and JHMDB data sets. The experimental results show that, our RSTAN outperforms other recent RNN-based approaches on UCF101 and HMDB51 as well as achieves the state-of-the-art on JHMDB.
Recent years have witnessed the popularity of using recurrent neural network (RNN) for action recognition in videos. However, videos are of high dimensionality and contain rich human dynamics with various motion scales, which makes the traditional RNNs difficult to capture complex action information. In this paper, we propose a novel recurrent spatial-temporal attention network (RSTAN) to address this challenge, where we introduce a spatial-temporal attention mechanism to adaptively identify key features from the global video context for every time-step prediction of RNN. More specifically, we make three main contributions from the following aspects. First, we reinforce the classical long short-term memory (LSTM) with a novel spatial-temporal attention module. At each time step, our module can automatically learn a spatial-temporal action representation from all sampled video frames, which is compact and highly relevant to the prediction at the current step. Second, we design an attention-driven appearance-motion fusion strategy to integrate appearance and motion LSTMs into a unified framework, where LSTMs with their spatial-temporal attention modules in two streams can be jointly trained in an end-to-end fashion. Third, we develop actor-attention regularization for RSTAN, which can guide our attention mechanism to focus on the important action regions around actors. We evaluate the proposed RSTAN on the benchmark UCF101, HMDB51 and JHMDB data sets. The experimental results show that, our RSTAN outperforms other recent RNN-based approaches on UCF101 and HMDB51 as well as achieves the state-of-the-art on JHMDB.Recent years have witnessed the popularity of using recurrent neural network (RNN) for action recognition in videos. However, videos are of high dimensionality and contain rich human dynamics with various motion scales, which makes the traditional RNNs difficult to capture complex action information. In this paper, we propose a novel recurrent spatial-temporal attention network (RSTAN) to address this challenge, where we introduce a spatial-temporal attention mechanism to adaptively identify key features from the global video context for every time-step prediction of RNN. More specifically, we make three main contributions from the following aspects. First, we reinforce the classical long short-term memory (LSTM) with a novel spatial-temporal attention module. At each time step, our module can automatically learn a spatial-temporal action representation from all sampled video frames, which is compact and highly relevant to the prediction at the current step. Second, we design an attention-driven appearance-motion fusion strategy to integrate appearance and motion LSTMs into a unified framework, where LSTMs with their spatial-temporal attention modules in two streams can be jointly trained in an end-to-end fashion. Third, we develop actor-attention regularization for RSTAN, which can guide our attention mechanism to focus on the important action regions around actors. We evaluate the proposed RSTAN on the benchmark UCF101, HMDB51 and JHMDB data sets. The experimental results show that, our RSTAN outperforms other recent RNN-based approaches on UCF101 and HMDB51 as well as achieves the state-of-the-art on JHMDB.
Author Wenbin Du
Yu Qiao
Yali Wang
Author_xml – sequence: 1
  givenname: Wenbin
  surname: Du
  fullname: Du, Wenbin
– sequence: 2
  givenname: Yali
  surname: Wang
  fullname: Wang, Yali
– sequence: 3
  givenname: Yu
  orcidid: 0000-0002-1889-2567
  surname: Qiao
  fullname: Qiao, Yu
BackLink https://www.ncbi.nlm.nih.gov/pubmed/29990061$$D View this record in MEDLINE/PubMed
BookMark eNp9kc1vEzEQxS3Uin7AHQkJrcSFy6YzttdeH6OKj0qlRRC4WsaZRS6bdbC9Qvz3OE3ooQfsg0ej35ux3jtjR1OciLEXCAtEMBerq08LDqgXXOu-U-IJO0UjsQWQ_KjW0OlWozQn7CznOwCUHaqn7IQbYwAUnrKPn8nPKdFUmi9bV4Ib2xVttjG5sVmWUvshTs0Nld8x_WyGmJqlv29VXfwxhfs6TM23sKaYn7HjwY2Znh_ec_b13dvV5Yf2-vb91eXyuvUSRWl7RXIAZwxpwIHqUV52mpxYa-cGM_TCe-807xQg1yCF8p3jNPQctNBCnLM3-7nbFH_NlIvdhOxpHN1Ecc6Wg-qFlHVZRV8_Qu_inKb6O4umByNBaqzUqwM1f9_Q2m5T2Lj0x_4zqgKwB3yKOScaHhAEu8vC1izsLgt7yKJK1COJD8XtDCvJhfF_wpd7YajOPOzpkQtT71_95ZS9
CODEN IIPRE4
CitedBy_id crossref_primary_10_1007_s10489_024_06001_z
crossref_primary_10_3390_s22176554
crossref_primary_10_1109_TIP_2021_3100556
crossref_primary_10_1007_s11042_022_12219_1
crossref_primary_10_1109_TNSRE_2020_3039297
crossref_primary_10_1016_j_ins_2022_05_092
crossref_primary_10_1007_s10845_023_02318_7
crossref_primary_10_1016_j_neucom_2024_127975
crossref_primary_10_1007_s12517_021_09229_y
crossref_primary_10_1016_j_eswa_2022_117730
crossref_primary_10_1109_TNNLS_2020_2978613
crossref_primary_10_1007_s00382_024_07348_2
crossref_primary_10_3233_IA_190021
crossref_primary_10_3390_s21175921
crossref_primary_10_1016_j_neucom_2024_127973
crossref_primary_10_1109_TSG_2022_3166600
crossref_primary_10_1155_2021_6690606
crossref_primary_10_1109_TIP_2023_3270105
crossref_primary_10_1109_TMM_2021_3050058
crossref_primary_10_1007_s00521_019_04516_y
crossref_primary_10_1016_j_imavis_2019_08_009
crossref_primary_10_3390_s22134863
crossref_primary_10_1016_j_ins_2022_11_012
crossref_primary_10_1007_s00371_023_02988_7
crossref_primary_10_1038_s41538_025_00376_0
crossref_primary_10_1109_TMM_2022_3148588
crossref_primary_10_1007_s00138_018_0956_5
crossref_primary_10_1109_TSTE_2019_2897136
crossref_primary_10_1088_1361_6501_ad3ea6
crossref_primary_10_1109_TIP_2021_3113570
crossref_primary_10_1016_j_neucom_2021_03_120
crossref_primary_10_1007_s11220_022_00399_x
crossref_primary_10_1109_TIM_2022_3204100
crossref_primary_10_1016_j_eswa_2022_118484
crossref_primary_10_1109_TIM_2023_3320734
crossref_primary_10_1007_s41095_022_0271_y
crossref_primary_10_3390_s21144720
crossref_primary_10_3390_rs15030842
crossref_primary_10_1109_TIP_2021_3058599
crossref_primary_10_1109_TIP_2020_2987425
crossref_primary_10_1016_j_patcog_2023_110071
crossref_primary_10_1007_s11042_022_13068_8
crossref_primary_10_1016_j_knosys_2022_108786
crossref_primary_10_1016_j_semcancer_2022_08_005
crossref_primary_10_1109_JSEN_2020_3019258
crossref_primary_10_1109_TNNLS_2021_3105184
crossref_primary_10_3390_rs16132427
crossref_primary_10_1016_j_dsp_2022_103449
crossref_primary_10_1007_s11042_023_15022_8
crossref_primary_10_1155_2021_3155357
crossref_primary_10_1109_TITS_2022_3208952
crossref_primary_10_1109_ACCESS_2022_3206449
crossref_primary_10_1109_TMM_2018_2862341
crossref_primary_10_1371_journal_pone_0244647
crossref_primary_10_1109_TMM_2020_3011317
crossref_primary_10_1007_s10470_018_1306_2
crossref_primary_10_1007_s11042_023_14355_8
crossref_primary_10_1109_TIP_2020_2989864
crossref_primary_10_1109_TIM_2023_3302372
crossref_primary_10_1016_j_neucom_2020_06_032
crossref_primary_10_1109_TCSVT_2019_2958188
crossref_primary_10_1016_j_cie_2022_108559
crossref_primary_10_1016_j_neucom_2020_12_020
crossref_primary_10_1007_s00521_023_08559_0
crossref_primary_10_1016_j_patcog_2019_03_010
crossref_primary_10_1109_JSEN_2022_3143705
crossref_primary_10_1002_int_22591
crossref_primary_10_1109_TVT_2023_3313593
crossref_primary_10_1109_ACCESS_2021_3132787
crossref_primary_10_3390_electronics12173729
crossref_primary_10_1016_j_knosys_2024_112480
crossref_primary_10_3390_s25020497
crossref_primary_10_1109_TMM_2019_2953814
crossref_primary_10_1109_TMM_2021_3056892
crossref_primary_10_1016_j_egyr_2024_11_057
crossref_primary_10_1007_s11063_020_10248_1
crossref_primary_10_1109_TMM_2024_3386339
crossref_primary_10_12677_mos_2025_141002
crossref_primary_10_1371_journal_pone_0265115
crossref_primary_10_1002_hbm_26399
crossref_primary_10_1016_j_phycom_2021_101584
crossref_primary_10_3390_bioengineering11121180
crossref_primary_10_1109_TITS_2019_2942096
crossref_primary_10_1109_TASE_2021_3077689
crossref_primary_10_1016_j_neucom_2022_10_037
crossref_primary_10_1109_ACCESS_2019_2906370
crossref_primary_10_1109_TCPMT_2023_3282616
crossref_primary_10_1109_JIOT_2021_3081694
crossref_primary_10_3233_JIFS_220230
crossref_primary_10_1049_ipr2_12152
crossref_primary_10_1007_s13042_023_01774_0
crossref_primary_10_1016_j_artmed_2024_102777
crossref_primary_10_1007_s11571_020_09626_1
crossref_primary_10_1109_TMM_2021_3058050
crossref_primary_10_1016_j_jvcir_2023_103804
crossref_primary_10_1109_ACCESS_2020_2979549
crossref_primary_10_1007_s40747_021_00606_4
crossref_primary_10_1007_s11269_025_04166_x
crossref_primary_10_1109_ACCESS_2020_3003939
crossref_primary_10_1109_TIP_2022_3205210
crossref_primary_10_3389_fpls_2020_601250
crossref_primary_10_1109_TETCI_2024_3518613
crossref_primary_10_1007_s11042_023_17345_y
crossref_primary_10_1109_TMM_2023_3326289
crossref_primary_10_1007_s00530_022_00961_3
crossref_primary_10_1016_j_eswa_2022_118791
crossref_primary_10_3390_technologies13020053
crossref_primary_10_1016_j_neucom_2021_06_088
crossref_primary_10_2139_ssrn_4193374
crossref_primary_10_1080_08839514_2020_1765110
crossref_primary_10_1016_j_cviu_2019_102794
crossref_primary_10_1109_TIE_2020_2972443
crossref_primary_10_1109_TIP_2019_2917283
crossref_primary_10_1109_TIP_2019_2914577
crossref_primary_10_1109_TIP_2019_2901707
crossref_primary_10_1155_2022_4667640
crossref_primary_10_1007_s11063_020_10211_0
crossref_primary_10_1016_j_jvcir_2019_102650
crossref_primary_10_1016_j_inffus_2023_101949
Cites_doi 10.1007/s11263-005-1838-7
10.1016/j.imavis.2009.11.014
10.1109/ICCV.2013.396
10.1109/CVPR.2014.223
10.1109/WACV.2017.26
10.1145/2733373.2806222
10.1109/ICCV.2017.316
10.1109/TPAMI.2012.59
10.1109/ICCV.2015.368
10.1109/CVPR.2016.219
10.1109/CVPRW.2017.161
10.1109/TCYB.2016.2519448
10.1109/ICCV.2015.522
10.1007/978-3-319-46493-0_45
10.1109/CVPR.2015.7298594
10.5244/C.22.99
10.1109/ICCV.2013.441
10.1109/CVPR.2005.177
10.1109/CVPR.2015.7299059
10.21236/ADA623249
10.1109/ICCV.2015.222
10.1109/TMM.2016.2626959
10.1109/CVPR.2017.337
10.1109/ICCV.2015.512
10.1109/TMM.2017.2666540
10.1109/ICCV.2011.6126543
10.1145/1291233.1291311
10.1007/978-3-540-88688-4_48
10.1162/neco.1997.9.8.1735
10.1109/CVPR.2017.172
10.1109/ICCV.2015.510
10.1109/CVPR.2016.213
10.1109/CVPR.2015.7298676
10.1109/CVPR.2016.331
10.1109/CVPR.2015.7298935
10.1016/j.sigpro.2015.10.035
10.1109/WACV.2017.22
10.1109/CVPR.2011.5995407
10.1109/WACV.2017.27
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
DBID 97E
RIA
RIE
AAYXX
CITATION
NPM
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
7X8
DOI 10.1109/TIP.2017.2778563
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE Electronic Library (IEL)
CrossRef
PubMed
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitle CrossRef
PubMed
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList PubMed
MEDLINE - Academic
Technology Research Database

Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
Engineering
EISSN 1941-0042
EndPage 1360
ExternalDocumentID 29990061
10_1109_TIP_2017_2778563
8123939
Genre orig-research
Journal Article
GrantInformation_xml – fundername: External Cooperation Program of BIC Chinese Academy of Sciences
  grantid: 172644KYSB20150019; 172644KYSB20160033
  funderid: 10.13039/501100002367
– fundername: Shenzhen Basic Research Program
  grantid: JCYJ20150925163005055; JCYJ20160229193541167
– fundername: National Natural Science Foundation of China
  grantid: U1613211; 61633021; 61502470
  funderid: 10.13039/501100001809
GroupedDBID ---
-~X
.DC
0R~
29I
4.4
53G
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABFSI
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
HZ~
H~9
ICLAB
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
RIA
RIE
RNS
TAE
TN5
VH1
AAYOK
AAYXX
CITATION
RIG
NPM
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
7X8
ID FETCH-LOGICAL-c413t-86e4f0a99e701feeee6c457ea3d7aaf9f83ccca725601270436c5a2ef82073733
IEDL.DBID RIE
ISSN 1057-7149
1941-0042
IngestDate Fri Jul 11 07:29:54 EDT 2025
Sun Jun 29 16:07:01 EDT 2025
Thu Apr 03 07:02:04 EDT 2025
Tue Jul 01 02:03:16 EDT 2025
Thu Apr 24 22:54:41 EDT 2025
Wed Aug 27 02:52:25 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 3
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c413t-86e4f0a99e701feeee6c457ea3d7aaf9f83ccca725601270436c5a2ef82073733
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0002-1889-2567
PMID 29990061
PQID 1980940471
PQPubID 85429
PageCount 14
ParticipantIDs proquest_miscellaneous_2068344413
pubmed_primary_29990061
proquest_journals_1980940471
crossref_citationtrail_10_1109_TIP_2017_2778563
crossref_primary_10_1109_TIP_2017_2778563
ieee_primary_8123939
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2018-03-01
PublicationDateYYYYMMDD 2018-03-01
PublicationDate_xml – month: 03
  year: 2018
  text: 2018-03-01
  day: 01
PublicationDecade 2010
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: New York
PublicationTitle IEEE transactions on image processing
PublicationTitleAbbrev TIP
PublicationTitleAlternate IEEE Trans Image Process
PublicationYear 2018
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
varol (ref67) 2016
yeung (ref12) 2015
ref15
peng (ref20) 2014
ref58
ref14
ref53
soomro (ref60) 2012
ballas (ref52) 2016
ref10
wang (ref38) 2017
tompson (ref57) 2014
ref17
ref16
diba (ref66) 2016
ref19
ref18
wang (ref55) 2016
srivastava (ref6) 2015
kar (ref36) 2016
ref51
diba (ref43) 2016
ref50
bahdanau (ref46) 2014
wang (ref42) 2016
ref48
xu (ref49) 2015
zhu (ref69) 2017
qiu (ref68) 2016
ref8
ref7
ref4
peng (ref59) 2014
ref3
simonyan (ref9) 2014
feichtenhofer (ref44) 2016
team (ref63) 2016
ref40
ref35
ref34
ref37
ref31
ref30
ref33
ref32
sharma (ref54) 2015
ref2
ref1
ref39
ma (ref45) 2017
ref71
ref70
krizhevsky (ref26) 2012
ref72
ref24
ng (ref5) 2015
ref23
cherian (ref41) 2017
simonyan (ref27) 2014
ref22
ref65
ref21
ref28
wang (ref29) 2015
bazzani (ref56) 2017
li (ref11) 2016
graves (ref47) 2014
ref62
ref61
he (ref25) 2015
ding (ref64) 2014
References_xml – ident: ref14
  doi: 10.1007/s11263-005-1838-7
– year: 2014
  ident: ref64
  publication-title: Theano-based large-scale visual recognition with multiple gpus
– ident: ref1
  doi: 10.1016/j.imavis.2009.11.014
– start-page: 2048
  year: 2015
  ident: ref49
  article-title: Show, attend and tell: Neural image caption generation with visual attention
  publication-title: Proc ICML
– ident: ref62
  doi: 10.1109/ICCV.2013.396
– start-page: 1799
  year: 2014
  ident: ref57
  article-title: Joint training of a convolutional network and a graphical model for human pose estimation
  publication-title: Proc NIPS
– year: 2017
  ident: ref41
  publication-title: Second-order temporal pooling for action recognition
– ident: ref4
  doi: 10.1109/CVPR.2014.223
– year: 2015
  ident: ref54
  publication-title: Action recognition using visual attention
– start-page: 843
  year: 2015
  ident: ref6
  article-title: Unsupervised learning of video representations using LSTMs
  publication-title: Proc ICML
– year: 2014
  ident: ref20
  publication-title: Bag of visual words and fusion methods for action recognition Comprehensive study and good practice
– year: 2016
  ident: ref55
  publication-title: Hierarchical attention network for action recognition in videos
– ident: ref37
  doi: 10.1109/WACV.2017.26
– start-page: 1097
  year: 2012
  ident: ref26
  article-title: ImageNet classification with deep convolutional neural networks
  publication-title: Proc NIPS
– ident: ref51
  doi: 10.1145/2733373.2806222
– year: 2016
  ident: ref67
  publication-title: Long-term temporal convolutions for action recognition
– ident: ref35
  doi: 10.1109/ICCV.2017.316
– year: 2016
  ident: ref11
  publication-title: Videolstm convolves attends and flows for action recognition
– ident: ref3
  doi: 10.1109/TPAMI.2012.59
– ident: ref71
  doi: 10.1109/ICCV.2015.368
– ident: ref33
  doi: 10.1109/CVPR.2016.219
– start-page: 1
  year: 2016
  ident: ref52
  article-title: Delving deeper into convolutional networks for learning video representations
  publication-title: Proc ICLR
– ident: ref31
  doi: 10.1109/CVPRW.2017.161
– ident: ref23
  doi: 10.1109/TCYB.2016.2519448
– ident: ref7
  doi: 10.1109/ICCV.2015.522
– ident: ref72
  doi: 10.1007/978-3-319-46493-0_45
– ident: ref28
  doi: 10.1109/CVPR.2015.7298594
– year: 2015
  ident: ref12
  publication-title: Every moment counts Dense detailed labeling of actions in complex videos
– ident: ref13
  doi: 10.5244/C.22.99
– start-page: 1
  year: 2017
  ident: ref56
  article-title: Recurrent mixture density network for spatiotemporal visual attention
  publication-title: Proc ICLR
– ident: ref17
  doi: 10.1109/ICCV.2013.441
– ident: ref19
  doi: 10.1109/CVPR.2005.177
– ident: ref22
  doi: 10.1109/CVPR.2015.7299059
– year: 2017
  ident: ref69
  publication-title: Hidden two-stream convolutional networks for action recognition
– ident: ref2
  doi: 10.21236/ADA623249
– ident: ref58
  doi: 10.1109/ICCV.2015.222
– year: 2015
  ident: ref5
  publication-title: Beyond short snippets Deep networks for video classification
– year: 2016
  ident: ref66
  publication-title: Efficient two-stream motion and appearance 3D CNNs for video classification
– ident: ref24
  doi: 10.1109/TMM.2016.2626959
– ident: ref32
  doi: 10.1109/CVPR.2017.337
– ident: ref50
  doi: 10.1109/ICCV.2015.512
– year: 2016
  ident: ref43
  publication-title: Deep Temporal Linear Encoding Networks
– ident: ref30
  doi: 10.1109/TMM.2017.2666540
– year: 2015
  ident: ref29
  publication-title: Towards good practices for very deep two-stream convnets
– ident: ref61
  doi: 10.1109/ICCV.2011.6126543
– year: 2014
  ident: ref27
  publication-title: Very Deep Convolutional Networks for Large-scale Image Recognition
– year: 2017
  ident: ref38
  publication-title: Action representation using classifier decision boundaries
– year: 2014
  ident: ref46
  publication-title: Neural machine translation by jointly learning to align and translate
– year: 2016
  ident: ref36
  publication-title: AdaScan Adaptive scan pooling in deep convolutional neural networks for human action recognition in videos
– start-page: 20
  year: 2016
  ident: ref42
  article-title: Temporal segment networks: Towards good practices for deep action recognition
  publication-title: Proc ECCV
– ident: ref15
  doi: 10.1145/1291233.1291311
– ident: ref18
  doi: 10.1007/978-3-540-88688-4_48
– ident: ref10
  doi: 10.1162/neco.1997.9.8.1735
– ident: ref40
  doi: 10.1109/CVPR.2017.172
– ident: ref8
  doi: 10.1109/ICCV.2015.510
– ident: ref34
  doi: 10.1109/CVPR.2016.213
– start-page: 581
  year: 2014
  ident: ref59
  article-title: Action recognition with stacked fisher vectors
  publication-title: Proc ECCV
– year: 2012
  ident: ref60
  publication-title: Ucf101 A Dataset of 101 Human Actions Classes from Videos in the Wild
– ident: ref70
  doi: 10.1109/CVPR.2015.7298676
– ident: ref65
  doi: 10.1109/CVPR.2016.331
– year: 2016
  ident: ref68
  publication-title: Deep Quantization Encoding Convolutional Activations with Deep Generative Model
– ident: ref48
  doi: 10.1109/CVPR.2015.7298935
– year: 2014
  ident: ref9
  publication-title: Two-stream convolutional networks for action recognition in videos
– ident: ref21
  doi: 10.1016/j.sigpro.2015.10.035
– year: 2017
  ident: ref45
  publication-title: TS-LSTM and temporal-inception Exploiting spatiotemporal dynamics for activity recognition
– start-page: 1764
  year: 2014
  ident: ref47
  article-title: Towards end-to-end speech recognition with recurrent neural networks
  publication-title: Proc ICML
– ident: ref39
  doi: 10.1109/WACV.2017.22
– ident: ref16
  doi: 10.1109/CVPR.2011.5995407
– year: 2015
  ident: ref25
  publication-title: Deep residual learning for image recognition
– ident: ref53
  doi: 10.1109/WACV.2017.27
– start-page: 3468
  year: 2016
  ident: ref44
  article-title: Spatiotemporal residual networks for video action recognition
  publication-title: Proc NIPS
– year: 2016
  ident: ref63
  publication-title: Theano A Python framework for fast computation of mathematical expressions
SSID ssj0014516
Score 2.6062515
Snippet Recent years have witnessed the popularity of using recurrent neural network (RNN) for action recognition in videos. However, videos are of high dimensionality...
SourceID proquest
pubmed
crossref
ieee
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 1347
SubjectTerms Action recognition
actor-attention regularization
attention-driven fusion
Computer vision
Feature extraction
Human motion
Image recognition
Moving object recognition
Neural networks
Optical imaging
Recognition
Recurrent neural networks
Regularization
RSTAN
spatial-temporal attention
Three-dimensional displays
Videos
Title Recurrent Spatial-Temporal Attention Network for Action Recognition in Videos
URI https://ieeexplore.ieee.org/document/8123939
https://www.ncbi.nlm.nih.gov/pubmed/29990061
https://www.proquest.com/docview/1980940471
https://www.proquest.com/docview/2068344413
Volume 27
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3BTtwwEB0BJ3ooFNoSoMhIvVRqdrNxHNtHVBXRSosqtFTcIscZS6hVFrHZC1_P2E4iqNqKW6Q4ju2Z0Tx7xm8APsoCOUF7TGdWiLSotSCbEyrFXOaNqstahWIw88vy4rr4fiNuNuDzeBcGEUPyGU78Y4jlN0u79kdlU3JGXHO9CZu0cYt3tcaIgS84GyKbQqaSYP8Qksz0dPHth8_hkpNcSiVK_swFhZoq_4aXwc2c78B8GGDMLvk1WXf1xD78wd340hnswuseb7KzqCBvYAPbPdjpsSfrLXu1B6-eEBPuw_zKH8N74ibmaxaTjqaLyGFFXXVdTJFklzGFnBHuZWfhfgS7GvKR6Pm2ZT9vG1yu3sL1-dfFl4u0L7yQWvJpXapKLFxmtEaZzRxNAktbCImGN9IYp53iliQvw3Yul57F3gqToyM4Ibnk_B1stcsWD4DpPHcz44xtGsI-ymqjuMPcuqx0Dp1IYDrIorI9K7kvjvG7CruTTFckvcpLr-qll8Cn8Yu7yMjxn7b7XgZju375EzgexF31JruqZlp5LkFy1gmcjq_J2HwExbS4XK-o79LXJaFFSuB9VJOxb_Lr2gPCw7__8wi2aWQqpq8dw1Z3v8YPhGe6-iQo8iPCKvBv
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3BbtQwEB2VcgAOFFoogQJB4oJEdrN2HNvHClFtobtC1Rb1FjnOWKqosojNXvr1HdtJBAgQt0hxHNszo3n2jN8AvJUFcoL2mM2sEFlRa0E2J1SGTLJG1WWtQjGYxbKcXxSfLsXlDrwf78IgYkg-w4l_DLH8Zm23_qhsSs6Ia67vwF3y-4LF21pjzMCXnA2xTSEzScB_CErmero6_eKzuOSESalEyX9xQqGqyt8BZnA0J3uwGIYY80u-TbZdPbE3v7E3_u8cHsHDHnGmx1FFHsMOtvuw16PPtLftzT48-Ima8AAW5_4g3lM3pb5qMWlptoosVtRV18UkyXQZk8hTQr7pcbghkZ4PGUn0fNWmX68aXG-ewMXJx9WHedaXXsgsebUuUyUWLjdao8xnjiaBpS2ERMMbaYzTTnFLspdhQ8ek57G3wjB0BCgkl5w_hd123eIzSDVjbmacsU1D6EdZbRR3yKzLS-fQiQSmgywq2_OS-_IY11XYn-S6IulVXnpVL70E3o1ffI-cHP9oe-BlMLbrlz-Bo0HcVW-0m2qmlWcTJHedwJvxNZmbj6GYFtfbDfVd-soktEgJHEY1Gfsmz649JHz-53--hnvz1eKsOjtdfn4B92mUKiazHcFu92OLLwnddPWroNS3c4bzuQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Recurrent+Spatial-Temporal+Attention+Network+for+Action+Recognition+in+Videos&rft.jtitle=IEEE+transactions+on+image+processing&rft.au=Du%2C+Wenbin&rft.au=Wang%2C+Yali&rft.au=Qiao%2C+Yu&rft.date=2018-03-01&rft.issn=1057-7149&rft.eissn=1941-0042&rft.volume=27&rft.issue=3&rft.spage=1347&rft.epage=1360&rft_id=info:doi/10.1109%2FTIP.2017.2778563&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TIP_2017_2778563
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1057-7149&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1057-7149&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1057-7149&client=summon