FEXNet: Foreground Extraction Network for Human Action Recognition

As most human actions in video sequences embody the continuous interactions between foregrounds rather than the background scene, it is significant to disentangle these foregrounds from the background for advanced action recognition systems. In this paper, therefore, we propose a Foreground EXtracti...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on circuits and systems for video technology Vol. 32; no. 5; pp. 3141 - 3151
Main Authors Shen, Zhongwei, Wu, Xiao-Jun, Xu, Tianyang
Format Journal Article
LanguageEnglish
Published New York IEEE 01.05.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract As most human actions in video sequences embody the continuous interactions between foregrounds rather than the background scene, it is significant to disentangle these foregrounds from the background for advanced action recognition systems. In this paper, therefore, we propose a Foreground EXtraction (FEX) block to explicitly model the foreground clues to achieve effective management of action subjects. In particular, the designed FEX block contains two components. The first part is a Foreground Enhancement (FE) module, which highlights the potential feature channels related to the action attributes, providing channel-level refinement for the following spatiotemporal modeling. The second phase is a Scene Segregation (SS) module, which splits feature maps into foreground and background. Specifically, a temporal model with dynamic enhancement is constructed for the foreground part, reflecting the essential nature of the action category. While the background is modeled using simple spatial convolutions, mapping the inputs to the consistent feature space. The FEX blocks can be inserted into existing 2D CNNs (denoted as FEXNet) for spatiotemporal modeling, concentrating on the foreground clues for effective action inference. Our experiments performed on Something-Something V1, V2 and Kinetics400 verify the effectiveness of the proposed method.
AbstractList As most human actions in video sequences embody the continuous interactions between foregrounds rather than the background scene, it is significant to disentangle these foregrounds from the background for advanced action recognition systems. In this paper, therefore, we propose a Foreground EXtraction (FEX) block to explicitly model the foreground clues to achieve effective management of action subjects. In particular, the designed FEX block contains two components. The first part is a Foreground Enhancement (FE) module, which highlights the potential feature channels related to the action attributes, providing channel-level refinement for the following spatiotemporal modeling. The second phase is a Scene Segregation (SS) module, which splits feature maps into foreground and background. Specifically, a temporal model with dynamic enhancement is constructed for the foreground part, reflecting the essential nature of the action category. While the background is modeled using simple spatial convolutions, mapping the inputs to the consistent feature space. The FEX blocks can be inserted into existing 2D CNNs (denoted as FEXNet) for spatiotemporal modeling, concentrating on the foreground clues for effective action inference. Our experiments performed on Something-Something V1, V2 and Kinetics400 verify the effectiveness of the proposed method.
Author Wu, Xiao-Jun
Xu, Tianyang
Shen, Zhongwei
Author_xml – sequence: 1
  givenname: Zhongwei
  orcidid: 0000-0002-6701-1965
  surname: Shen
  fullname: Shen, Zhongwei
  email: shenzw_cv@163.com
  organization: School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
– sequence: 2
  givenname: Xiao-Jun
  orcidid: 0000-0002-0310-5778
  surname: Wu
  fullname: Wu, Xiao-Jun
  email: wu_xiaojun@jiangnan.edu.cn
  organization: School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
– sequence: 3
  givenname: Tianyang
  orcidid: 0000-0002-9015-3128
  surname: Xu
  fullname: Xu, Tianyang
  email: tianyang_xu@163.com
  organization: School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
BookMark eNp9kE9Lw0AQxRdRsK1-Ab0EPKfu7J9k11strRWKglbxtqzJbElts3WToH57E1s8ePA0D957M8yvTw5LXyIhZ0CHAFRfLsaPz4showyGHChP0vSA9EBKFTNG5WGrqYRYMZDHpF9VK0pBKJH2yPV08nKH9VU09QGXwTdlHk0-62CzuvBl1FofPrxFzodo1mxsGY12xgNmflkWnT4hR86uKzzdzwF5mk4W41k8v7-5HY_mcca0rGNQuUJluXaZtTbjjjkutaAA0grpUsVRS2vR5YljaZqoPJeUokOXilfKOR-Qi93ebfDvDVa1WfkmlO1JwxKpuUg4iDbFdqks-KoK6Mw2FBsbvgxQ07EyP6xMx8rsWbUl9aeUFbXtnmtJFOv_q-e7aoGIv7e0pFoA49_fUXkQ
CODEN ITCTEM
CitedBy_id crossref_primary_10_1109_TCSVT_2023_3262670
crossref_primary_10_1109_JSEN_2024_3380321
crossref_primary_10_1109_TCSVT_2024_3390133
crossref_primary_10_1109_TMM_2022_3224327
crossref_primary_10_1016_j_engappai_2024_108247
crossref_primary_10_1109_TCSVT_2022_3207518
crossref_primary_10_1109_TGRS_2023_3276894
crossref_primary_10_1109_JSEN_2024_3363042
crossref_primary_10_1109_JSEN_2023_3303912
crossref_primary_10_1109_TCSVT_2022_3221280
crossref_primary_10_1049_ell2_13215
crossref_primary_10_1109_TCSVT_2022_3207174
crossref_primary_10_1109_TCSVT_2023_3321977
crossref_primary_10_3390_math10183290
crossref_primary_10_1109_TCSVT_2023_3319140
crossref_primary_10_1109_TCSVT_2023_3274108
crossref_primary_10_1007_s10489_024_05408_y
crossref_primary_10_1109_JIOT_2023_3293506
crossref_primary_10_1109_TCSVT_2024_3395636
crossref_primary_10_1007_s11276_023_03267_y
crossref_primary_10_1016_j_displa_2023_102569
crossref_primary_10_1109_TCSVT_2022_3190450
crossref_primary_10_1109_TCSVT_2023_3276979
crossref_primary_10_3390_sym15122177
crossref_primary_10_1016_j_icte_2023_12_004
Cites_doi 10.1109/CVPR.2016.90
10.1109/TCSVT.2019.2962229
10.1109/TCSVT.2019.2945068
10.1109/ICCV.2017.622
10.1109/ICCV.2019.00718
10.1007/978-3-030-01228-1_25
10.1109/ICCV.2019.00630
10.1109/CVPR.2015.7299059
10.1007/s11263-021-01435-1
10.1109/TIP.2019.2919201
10.1109/TCSVT.2017.2719043
10.1109/CVPR.2015.7298789
10.1109/TPAMI.2016.2599174
10.1109/CVPR42600.2020.00043
10.1109/ICCV.2013.441
10.1109/CVPR.2018.00817
10.1109/CVPR.2009.5206771
10.1007/978-3-030-01225-0_18
10.1109/CVPR.2018.00675
10.1109/CVPR.2017.502
10.1007/978-3-030-58539-6_17
10.1109/ICCV.2019.00209
10.1007/s11263-012-0594-8
10.1109/CVPR.2017.113
10.1609/aaai.v34i07.6836
10.1109/TCSVT.2017.2665359
10.1109/CVPR.2018.00151
10.1109/TCSVT.2020.2984569
10.1109/CVPR.2018.00054
10.1109/CVPR42600.2020.00067
10.1109/ICCV.2017.74
10.1109/TPAMI.2019.2938758
10.1007/978-3-319-46484-8_2
10.1109/TCSVT.2020.3006223
10.1109/ICCV.2017.590
10.1109/CVPR.2016.106
10.1109/TCSVT.2018.2808685
10.1609/aaai.v33i01.33018401
10.24963/ijcai.2018/109
10.1109/ICCV.2015.510
10.1007/978-3-030-01267-0_19
10.1609/aaai.v32i1.12333
10.1007/978-3-030-01216-8_43
10.1109/CVPR.2018.00630
10.1109/CVPR42600.2020.00028
10.1017/9781108924238.008
10.1109/CVPR.2009.5206848
10.1109/5.726791
10.1109/CVPR.2018.00813
10.1109/CVPR.2015.7298872
10.1007/978-3-030-01246-5_49
10.1007/s11263-006-0009-9
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TCSVT.2021.3103677
DatabaseName IEEE Xplore (IEEE)
IEEE All-Society Periodicals Package (ASPP) 1998-Present
IEEE Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
DatabaseTitleList Technology Research Database

Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 1558-2205
EndPage 3151
ExternalDocumentID 10_1109_TCSVT_2021_3103677
9509412
Genre orig-research
GrantInformation_xml – fundername: National Key Research and Development Program of China
  grantid: 2017YFC1601800
  funderid: 10.13039/501100012166
– fundername: National Natural Science Foundation of China
  grantid: U1836218; 62020106012; 61672265
  funderid: 10.13039/501100001809
– fundername: 111 Project of Ministry of Education of China
  grantid: B12018
GroupedDBID -~X
0R~
29I
4.4
5GY
5VS
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
AENEX
AETIX
AGQYO
AGSQL
AHBIQ
AI.
AIBXA
AKJIK
AKQYR
ALLEH
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
EBS
EJD
HZ~
H~9
ICLAB
IFIPE
IFJZH
IPLJI
JAVBF
LAI
M43
O9-
OCL
P2P
RIA
RIE
RNS
RXW
TAE
TN5
VH1
AAYXX
CITATION
RIG
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c295t-18d8e8a39fcaaac3f2f35940115a45f783e95aaefd6f27768dd500efef74b0333
IEDL.DBID RIE
ISSN 1051-8215
IngestDate Mon Jun 30 06:54:18 EDT 2025
Tue Jul 01 00:41:16 EDT 2025
Thu Apr 24 22:54:15 EDT 2025
Wed Aug 27 02:40:12 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 5
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c295t-18d8e8a39fcaaac3f2f35940115a45f783e95aaefd6f27768dd500efef74b0333
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ORCID 0000-0002-6701-1965
0000-0002-9015-3128
0000-0002-0310-5778
PQID 2659346314
PQPubID 85433
PageCount 11
ParticipantIDs crossref_primary_10_1109_TCSVT_2021_3103677
proquest_journals_2659346314
crossref_citationtrail_10_1109_TCSVT_2021_3103677
ieee_primary_9509412
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2022-05-01
PublicationDateYYYYMMDD 2022-05-01
PublicationDate_xml – month: 05
  year: 2022
  text: 2022-05-01
  day: 01
PublicationDecade 2020
PublicationPlace New York
PublicationPlace_xml – name: New York
PublicationTitle IEEE transactions on circuits and systems for video technology
PublicationTitleAbbrev TCSVT
PublicationYear 2022
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References ref13
Soomro (ref15) 2012
ref57
ref12
ref56
ref58
ref53
ref52
ref11
ref55
ref10
ref17
ref16
ref19
ref18
Liu (ref54) 2020
ref51
ref50
Kay (ref14) 2017
ref46
ref45
ref48
ref47
ref42
ref41
ref44
ref43
Tran (ref36) 2017
ref49
ref8
ref7
ref9
ref4
ref6
ref5
ref40
Simonyan (ref25)
ref35
ref34
ref37
ref31
ref30
ref33
ref32
ref2
ref1
ref39
ref38
ref24
ref23
Chiang (ref3) 2013
ref26
ref20
ref22
ref21
ref28
ref27
ref29
References_xml – ident: ref49
  doi: 10.1109/CVPR.2016.90
– ident: ref1
  doi: 10.1109/TCSVT.2019.2962229
– ident: ref13
  doi: 10.1109/TCSVT.2019.2945068
– volume-title: arXiv:2005.06803
  year: 2020
  ident: ref54
  article-title: TAM: Temporal adaptive module for video recognition
– ident: ref16
  doi: 10.1109/ICCV.2017.622
– ident: ref19
  doi: 10.1109/ICCV.2019.00718
– volume-title: arXiv:1705.06950
  year: 2017
  ident: ref14
  article-title: The kinetics human action video dataset
– ident: ref53
  doi: 10.1007/978-3-030-01228-1_25
– ident: ref46
  doi: 10.1109/ICCV.2019.00630
– ident: ref23
  doi: 10.1109/CVPR.2015.7299059
– volume-title: Assistant driving system with video recognition
  year: 2013
  ident: ref3
– ident: ref9
  doi: 10.1007/s11263-021-01435-1
– ident: ref11
  doi: 10.1109/TIP.2019.2919201
– start-page: 568
  volume-title: Proc. NIPS
  ident: ref25
  article-title: Two-stream convolutional networks for action recognition in videos
– ident: ref2
  doi: 10.1109/TCSVT.2017.2719043
– ident: ref7
  doi: 10.1109/CVPR.2015.7298789
– ident: ref30
  doi: 10.1109/TPAMI.2016.2599174
– ident: ref55
  doi: 10.1109/CVPR42600.2020.00043
– ident: ref22
  doi: 10.1109/ICCV.2013.441
– ident: ref34
  doi: 10.1109/CVPR.2018.00817
– ident: ref5
  doi: 10.1109/CVPR.2009.5206771
– ident: ref41
  doi: 10.1007/978-3-030-01225-0_18
– ident: ref38
  doi: 10.1109/CVPR.2018.00675
– ident: ref37
  doi: 10.1109/CVPR.2017.502
– ident: ref57
  doi: 10.1007/978-3-030-58539-6_17
– ident: ref48
  doi: 10.1109/ICCV.2019.00209
– ident: ref21
  doi: 10.1007/s11263-012-0594-8
– ident: ref10
  doi: 10.1109/CVPR.2017.113
– ident: ref20
  doi: 10.1609/aaai.v34i07.6836
– ident: ref24
  doi: 10.1109/TCSVT.2017.2665359
– ident: ref27
  doi: 10.1109/CVPR.2018.00151
– ident: ref43
  doi: 10.1109/TCSVT.2020.2984569
– ident: ref45
  doi: 10.1109/CVPR.2018.00054
– ident: ref56
  doi: 10.1109/CVPR42600.2020.00067
– ident: ref58
  doi: 10.1109/ICCV.2017.74
– ident: ref18
  doi: 10.1109/TPAMI.2019.2938758
– ident: ref31
  doi: 10.1007/978-3-319-46484-8_2
– ident: ref29
  doi: 10.1109/TCSVT.2020.3006223
– ident: ref40
  doi: 10.1109/ICCV.2017.590
– ident: ref12
  doi: 10.1109/CVPR.2016.106
– ident: ref26
  doi: 10.1109/TCSVT.2018.2808685
– ident: ref33
  doi: 10.1609/aaai.v33i01.33018401
– ident: ref6
  doi: 10.24963/ijcai.2018/109
– ident: ref35
  doi: 10.1109/ICCV.2015.510
– ident: ref39
  doi: 10.1007/978-3-030-01267-0_19
– volume-title: arXiv:1708.05038
  year: 2017
  ident: ref36
  article-title: ConvNet architecture search for spatiotemporal feature learning
– ident: ref42
  doi: 10.1609/aaai.v32i1.12333
– ident: ref44
  doi: 10.1007/978-3-030-01216-8_43
– ident: ref28
  doi: 10.1109/CVPR.2018.00630
– ident: ref47
  doi: 10.1109/CVPR42600.2020.00028
– ident: ref50
  doi: 10.1017/9781108924238.008
– ident: ref51
  doi: 10.1109/CVPR.2009.5206848
– ident: ref17
  doi: 10.1109/5.726791
– ident: ref52
  doi: 10.1109/CVPR.2018.00813
– ident: ref8
  doi: 10.1109/CVPR.2015.7298872
– volume-title: arXiv:1212.0402
  year: 2012
  ident: ref15
  article-title: UCF101: A dataset of 101 human actions classes from videos in the wild
– ident: ref32
  doi: 10.1007/978-3-030-01246-5_49
– ident: ref4
  doi: 10.1007/s11263-006-0009-9
SSID ssj0014847
Score 2.5109057
Snippet As most human actions in video sequences embody the continuous interactions between foregrounds rather than the background scene, it is significant to...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 3141
SubjectTerms action recognition
Convolutional neural networks
Feature extraction
Feature maps
Foreground-related features
Human activity recognition
Human motion
Image recognition
Iron
Modelling
Modules
Solid modeling
spatiotemporal modeling
Spatiotemporal phenomena
Three-dimensional displays
Two dimensional models
Title FEXNet: Foreground Extraction Network for Human Action Recognition
URI https://ieeexplore.ieee.org/document/9509412
https://www.proquest.com/docview/2659346314
Volume 32
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF5qT3rwVcVqlT1406RJ9pGst1paitAetJXewmYfFyWVmoL4693dPPCFCCEEsgvLzCQzszvffABcxlIar4S5MV4mPWxcpJdkMTI3zTJKJWbSApynMzpZ4LslWbbAdYOFUUq54jPl20d3li9XYmO3yvrMdnuzlMJbJnErsVrNiQFOHJmYCRdCLzF-rAbIBKw_Hz48zk0qGIW-ZdWicfzFCTlWlR-_YudfxntgWq-sLCt58jdF5ov3b00b_7v0fbBbBZpwUFrGAWip_BDsfGo_2AG349FypoobaPk5Lbwjl3D0VqxLrAOclRXi0IS10O31w0H54r4uOlrlR2AxHs2HE6_iVPBExEjhhYlMVMIR04JzLpCONCIM28CQY6LjBClGOFdaUh3FJheRkgSB0krHOAsQQsegna9ydQJggDQyV8YYEVgoymTEkWACa6YJ5lkXhLWQU1E1HLe8F8-pSzwCljrFpFYxaaWYLrhq5ryU7Tb-HN2xkm5GVkLugl6ty7T6Il_TiBKGMEUhPv191hnYjiy0wRUz9kC7WG_UuQk4iuzCWdoH8XDRMA
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEB5ED-rBt7i6ag_etGvbJG3jTWWX9bE9aJW9lTSPi9KVtQvirzdJH_hChFIKTSDMTDszyXzzARxFQmivhJk2XipcrF2kG-cR0jdF8zAUmAoDcB4l4fABX4_JeA5OWiyMlNIWn8meebRn-WLCZ2ar7JSabm-GUnhB-33iV2it9swAx5ZOTAcMvhtrT9ZAZDx6ml7eP6Y6GQz8nuHVCqPoixuyvCo_fsbWwwxWYdSsrSoseerNyrzH37-1bfzv4tdgpQ41nfPKNtZhThYbsPypAeEmXAz640SWZ45h6DQAj0I4_bdyWqEdnKSqEXd0YOvY3X7nvHpx15QdTYoteBj008uhW7MquDygpHT9WMQyZogqzhjjSAUKEYpNaMgwUVGMJCWMSSVCFUQ6GxGCeJ5UUkU49xBC2zBfTAq5A46HFNJXTinhmMuQioAhTjlWVBHM8g74jZAzXrccN8wXz5lNPTyaWcVkRjFZrZgOHLdzXqqGG3-O3jSSbkfWQu5At9FlVn-Tr1kQEopwiHy8-_usQ1gcpqPb7PYqudmDpcAAHWxpYxfmy-lM7uvwo8wPrNV9AAb61Hk
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=FEXNet%3A+Foreground+Extraction+Network+for+Human+Action+Recognition&rft.jtitle=IEEE+transactions+on+circuits+and+systems+for+video+technology&rft.au=Shen%2C+Zhongwei&rft.au=Wu%2C+Xiao-Jun&rft.au=Xu%2C+Tianyang&rft.date=2022-05-01&rft.pub=IEEE&rft.issn=1051-8215&rft.volume=32&rft.issue=5&rft.spage=3141&rft.epage=3151&rft_id=info:doi/10.1109%2FTCSVT.2021.3103677&rft.externalDocID=9509412
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1051-8215&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1051-8215&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1051-8215&client=summon