FEXNet: Foreground Extraction Network for Human Action Recognition

As most human actions in video sequences embody the continuous interactions between foregrounds rather than the background scene, it is significant to disentangle these foregrounds from the background for advanced action recognition systems. In this paper, therefore, we propose a Foreground EXtracti...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 32; no. 5; pp. 3141 - 3151
Main Authors	Shen, Zhongwei, Wu, Xiao-Jun, Xu, Tianyang
Format	Journal Article
Language	English
Published	New York IEEE 01.05.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	action recognition Convolutional neural networks Feature extraction Feature maps Foreground-related features Human activity recognition Human motion Image recognition Iron Modelling Modules Solid modeling spatiotemporal modeling Spatiotemporal phenomena Three-dimensional displays Two dimensional models
Online Access	Get full text

Cover

Loading…

Abstract	As most human actions in video sequences embody the continuous interactions between foregrounds rather than the background scene, it is significant to disentangle these foregrounds from the background for advanced action recognition systems. In this paper, therefore, we propose a Foreground EXtraction (FEX) block to explicitly model the foreground clues to achieve effective management of action subjects. In particular, the designed FEX block contains two components. The first part is a Foreground Enhancement (FE) module, which highlights the potential feature channels related to the action attributes, providing channel-level refinement for the following spatiotemporal modeling. The second phase is a Scene Segregation (SS) module, which splits feature maps into foreground and background. Specifically, a temporal model with dynamic enhancement is constructed for the foreground part, reflecting the essential nature of the action category. While the background is modeled using simple spatial convolutions, mapping the inputs to the consistent feature space. The FEX blocks can be inserted into existing 2D CNNs (denoted as FEXNet) for spatiotemporal modeling, concentrating on the foreground clues for effective action inference. Our experiments performed on Something-Something V1, V2 and Kinetics400 verify the effectiveness of the proposed method.
AbstractList	As most human actions in video sequences embody the continuous interactions between foregrounds rather than the background scene, it is significant to disentangle these foregrounds from the background for advanced action recognition systems. In this paper, therefore, we propose a Foreground EXtraction (FEX) block to explicitly model the foreground clues to achieve effective management of action subjects. In particular, the designed FEX block contains two components. The first part is a Foreground Enhancement (FE) module, which highlights the potential feature channels related to the action attributes, providing channel-level refinement for the following spatiotemporal modeling. The second phase is a Scene Segregation (SS) module, which splits feature maps into foreground and background. Specifically, a temporal model with dynamic enhancement is constructed for the foreground part, reflecting the essential nature of the action category. While the background is modeled using simple spatial convolutions, mapping the inputs to the consistent feature space. The FEX blocks can be inserted into existing 2D CNNs (denoted as FEXNet) for spatiotemporal modeling, concentrating on the foreground clues for effective action inference. Our experiments performed on Something-Something V1, V2 and Kinetics400 verify the effectiveness of the proposed method.
Author	Wu, Xiao-Jun Xu, Tianyang Shen, Zhongwei
Author_xml	– sequence: 1 givenname: Zhongwei orcidid: 0000-0002-6701-1965 surname: Shen fullname: Shen, Zhongwei email: shenzw_cv@163.com organization: School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China – sequence: 2 givenname: Xiao-Jun orcidid: 0000-0002-0310-5778 surname: Wu fullname: Wu, Xiao-Jun email: wu_xiaojun@jiangnan.edu.cn organization: School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China – sequence: 3 givenname: Tianyang orcidid: 0000-0002-9015-3128 surname: Xu fullname: Xu, Tianyang email: tianyang_xu@163.com organization: School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China
BookMark	eNp9kE9Lw0AQxRdRsK1-Ab0EPKfu7J9k11strRWKglbxtqzJbElts3WToH57E1s8ePA0D957M8yvTw5LXyIhZ0CHAFRfLsaPz4showyGHChP0vSA9EBKFTNG5WGrqYRYMZDHpF9VK0pBKJH2yPV08nKH9VU09QGXwTdlHk0-62CzuvBl1FofPrxFzodo1mxsGY12xgNmflkWnT4hR86uKzzdzwF5mk4W41k8v7-5HY_mcca0rGNQuUJluXaZtTbjjjkutaAA0grpUsVRS2vR5YljaZqoPJeUokOXilfKOR-Qi93ebfDvDVa1WfkmlO1JwxKpuUg4iDbFdqks-KoK6Mw2FBsbvgxQ07EyP6xMx8rsWbUl9aeUFbXtnmtJFOv_q-e7aoGIv7e0pFoA49_fUXkQ
CODEN	ITCTEM
CitedBy_id	crossref_primary_10_1109_TCSVT_2023_3262670 crossref_primary_10_1109_JSEN_2024_3380321 crossref_primary_10_1109_TCSVT_2024_3390133 crossref_primary_10_1109_TMM_2022_3224327 crossref_primary_10_1016_j_engappai_2024_108247 crossref_primary_10_1109_TCSVT_2022_3207518 crossref_primary_10_1109_TGRS_2023_3276894 crossref_primary_10_1109_JSEN_2024_3363042 crossref_primary_10_1109_JSEN_2023_3303912 crossref_primary_10_1109_TCSVT_2022_3221280 crossref_primary_10_1049_ell2_13215 crossref_primary_10_1109_TCSVT_2022_3207174 crossref_primary_10_1109_TCSVT_2023_3321977 crossref_primary_10_3390_math10183290 crossref_primary_10_1109_TCSVT_2023_3319140 crossref_primary_10_1109_TCSVT_2023_3274108 crossref_primary_10_1007_s10489_024_05408_y crossref_primary_10_1109_JIOT_2023_3293506 crossref_primary_10_1109_TCSVT_2024_3395636 crossref_primary_10_1007_s11276_023_03267_y crossref_primary_10_1016_j_displa_2023_102569 crossref_primary_10_1109_TCSVT_2022_3190450 crossref_primary_10_1109_TCSVT_2023_3276979 crossref_primary_10_3390_sym15122177 crossref_primary_10_1016_j_icte_2023_12_004
Cites_doi	10.1109/CVPR.2016.90 10.1109/TCSVT.2019.2962229 10.1109/TCSVT.2019.2945068 10.1109/ICCV.2017.622 10.1109/ICCV.2019.00718 10.1007/978-3-030-01228-1_25 10.1109/ICCV.2019.00630 10.1109/CVPR.2015.7299059 10.1007/s11263-021-01435-1 10.1109/TIP.2019.2919201 10.1109/TCSVT.2017.2719043 10.1109/CVPR.2015.7298789 10.1109/TPAMI.2016.2599174 10.1109/CVPR42600.2020.00043 10.1109/ICCV.2013.441 10.1109/CVPR.2018.00817 10.1109/CVPR.2009.5206771 10.1007/978-3-030-01225-0_18 10.1109/CVPR.2018.00675 10.1109/CVPR.2017.502 10.1007/978-3-030-58539-6_17 10.1109/ICCV.2019.00209 10.1007/s11263-012-0594-8 10.1109/CVPR.2017.113 10.1609/aaai.v34i07.6836 10.1109/TCSVT.2017.2665359 10.1109/CVPR.2018.00151 10.1109/TCSVT.2020.2984569 10.1109/CVPR.2018.00054 10.1109/CVPR42600.2020.00067 10.1109/ICCV.2017.74 10.1109/TPAMI.2019.2938758 10.1007/978-3-319-46484-8_2 10.1109/TCSVT.2020.3006223 10.1109/ICCV.2017.590 10.1109/CVPR.2016.106 10.1109/TCSVT.2018.2808685 10.1609/aaai.v33i01.33018401 10.24963/ijcai.2018/109 10.1109/ICCV.2015.510 10.1007/978-3-030-01267-0_19 10.1609/aaai.v32i1.12333 10.1007/978-3-030-01216-8_43 10.1109/CVPR.2018.00630 10.1109/CVPR42600.2020.00028 10.1017/9781108924238.008 10.1109/CVPR.2009.5206848 10.1109/5.726791 10.1109/CVPR.2018.00813 10.1109/CVPR.2015.7298872 10.1007/978-3-030-01246-5_49 10.1007/s11263-006-0009-9
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
DBID	97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D
DOI	10.1109/TCSVT.2021.3103677
DatabaseName	IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional
DatabaseTitleList	Technology Research Database
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	1558-2205
EndPage	3151
ExternalDocumentID	10_1109_TCSVT_2021_3103677 9509412
Genre	orig-research
GrantInformation_xml	– fundername: National Key Research and Development Program of China grantid: 2017YFC1601800 funderid: 10.13039/501100012166 – fundername: National Natural Science Foundation of China grantid: U1836218; 62020106012; 61672265 funderid: 10.13039/501100001809 – fundername: 111 Project of Ministry of Education of China grantid: B12018
GroupedDBID	-~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS RXW TAE TN5 VH1 AAYXX CITATION RIG 7SC 7SP 8FD JQ2 L7M L~C L~D
ID	FETCH-LOGICAL-c295t-18d8e8a39fcaaac3f2f35940115a45f783e95aaefd6f27768dd500efef74b0333
IEDL.DBID	RIE
ISSN	1051-8215
IngestDate	Mon Jun 30 06:54:18 EDT 2025 Tue Jul 01 00:41:16 EDT 2025 Thu Apr 24 22:54:15 EDT 2025 Wed Aug 27 02:40:12 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	5
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c295t-18d8e8a39fcaaac3f2f35940115a45f783e95aaefd6f27768dd500efef74b0333
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0002-6701-1965 0000-0002-9015-3128 0000-0002-0310-5778
PQID	2659346314
PQPubID	85433
PageCount	11
ParticipantIDs	crossref_primary_10_1109_TCSVT_2021_3103677 proquest_journals_2659346314 crossref_citationtrail_10_1109_TCSVT_2021_3103677 ieee_primary_9509412
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2022-05-01
PublicationDateYYYYMMDD	2022-05-01
PublicationDate_xml	– month: 05 year: 2022 text: 2022-05-01 day: 01
PublicationDecade	2020
PublicationPlace	New York
PublicationPlace_xml	– name: New York
PublicationTitle	IEEE transactions on circuits and systems for video technology
PublicationTitleAbbrev	TCSVT
PublicationYear	2022
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref13 Soomro (ref15) 2012 ref57 ref12 ref56 ref58 ref53 ref52 ref11 ref55 ref10 ref17 ref16 ref19 ref18 Liu (ref54) 2020 ref51 ref50 Kay (ref14) 2017 ref46 ref45 ref48 ref47 ref42 ref41 ref44 ref43 Tran (ref36) 2017 ref49 ref8 ref7 ref9 ref4 ref6 ref5 ref40 Simonyan (ref25) ref35 ref34 ref37 ref31 ref30 ref33 ref32 ref2 ref1 ref39 ref38 ref24 ref23 Chiang (ref3) 2013 ref26 ref20 ref22 ref21 ref28 ref27 ref29
References_xml	– ident: ref49 doi: 10.1109/CVPR.2016.90 – ident: ref1 doi: 10.1109/TCSVT.2019.2962229 – ident: ref13 doi: 10.1109/TCSVT.2019.2945068 – volume-title: arXiv:2005.06803 year: 2020 ident: ref54 article-title: TAM: Temporal adaptive module for video recognition – ident: ref16 doi: 10.1109/ICCV.2017.622 – ident: ref19 doi: 10.1109/ICCV.2019.00718 – volume-title: arXiv:1705.06950 year: 2017 ident: ref14 article-title: The kinetics human action video dataset – ident: ref53 doi: 10.1007/978-3-030-01228-1_25 – ident: ref46 doi: 10.1109/ICCV.2019.00630 – ident: ref23 doi: 10.1109/CVPR.2015.7299059 – volume-title: Assistant driving system with video recognition year: 2013 ident: ref3 – ident: ref9 doi: 10.1007/s11263-021-01435-1 – ident: ref11 doi: 10.1109/TIP.2019.2919201 – start-page: 568 volume-title: Proc. NIPS ident: ref25 article-title: Two-stream convolutional networks for action recognition in videos – ident: ref2 doi: 10.1109/TCSVT.2017.2719043 – ident: ref7 doi: 10.1109/CVPR.2015.7298789 – ident: ref30 doi: 10.1109/TPAMI.2016.2599174 – ident: ref55 doi: 10.1109/CVPR42600.2020.00043 – ident: ref22 doi: 10.1109/ICCV.2013.441 – ident: ref34 doi: 10.1109/CVPR.2018.00817 – ident: ref5 doi: 10.1109/CVPR.2009.5206771 – ident: ref41 doi: 10.1007/978-3-030-01225-0_18 – ident: ref38 doi: 10.1109/CVPR.2018.00675 – ident: ref37 doi: 10.1109/CVPR.2017.502 – ident: ref57 doi: 10.1007/978-3-030-58539-6_17 – ident: ref48 doi: 10.1109/ICCV.2019.00209 – ident: ref21 doi: 10.1007/s11263-012-0594-8 – ident: ref10 doi: 10.1109/CVPR.2017.113 – ident: ref20 doi: 10.1609/aaai.v34i07.6836 – ident: ref24 doi: 10.1109/TCSVT.2017.2665359 – ident: ref27 doi: 10.1109/CVPR.2018.00151 – ident: ref43 doi: 10.1109/TCSVT.2020.2984569 – ident: ref45 doi: 10.1109/CVPR.2018.00054 – ident: ref56 doi: 10.1109/CVPR42600.2020.00067 – ident: ref58 doi: 10.1109/ICCV.2017.74 – ident: ref18 doi: 10.1109/TPAMI.2019.2938758 – ident: ref31 doi: 10.1007/978-3-319-46484-8_2 – ident: ref29 doi: 10.1109/TCSVT.2020.3006223 – ident: ref40 doi: 10.1109/ICCV.2017.590 – ident: ref12 doi: 10.1109/CVPR.2016.106 – ident: ref26 doi: 10.1109/TCSVT.2018.2808685 – ident: ref33 doi: 10.1609/aaai.v33i01.33018401 – ident: ref6 doi: 10.24963/ijcai.2018/109 – ident: ref35 doi: 10.1109/ICCV.2015.510 – ident: ref39 doi: 10.1007/978-3-030-01267-0_19 – volume-title: arXiv:1708.05038 year: 2017 ident: ref36 article-title: ConvNet architecture search for spatiotemporal feature learning – ident: ref42 doi: 10.1609/aaai.v32i1.12333 – ident: ref44 doi: 10.1007/978-3-030-01216-8_43 – ident: ref28 doi: 10.1109/CVPR.2018.00630 – ident: ref47 doi: 10.1109/CVPR42600.2020.00028 – ident: ref50 doi: 10.1017/9781108924238.008 – ident: ref51 doi: 10.1109/CVPR.2009.5206848 – ident: ref17 doi: 10.1109/5.726791 – ident: ref52 doi: 10.1109/CVPR.2018.00813 – ident: ref8 doi: 10.1109/CVPR.2015.7298872 – volume-title: arXiv:1212.0402 year: 2012 ident: ref15 article-title: UCF101: A dataset of 101 human actions classes from videos in the wild – ident: ref32 doi: 10.1007/978-3-030-01246-5_49 – ident: ref4 doi: 10.1007/s11263-006-0009-9
SSID	ssj0014847
Score	2.5109057
Snippet	As most human actions in video sequences embody the continuous interactions between foregrounds rather than the background scene, it is significant to...
SourceID	proquest crossref ieee
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	3141
SubjectTerms	action recognition Convolutional neural networks Feature extraction Feature maps Foreground-related features Human activity recognition Human motion Image recognition Iron Modelling Modules Solid modeling spatiotemporal modeling Spatiotemporal phenomena Three-dimensional displays Two dimensional models
Title	FEXNet: Foreground Extraction Network for Human Action Recognition
URI	https://ieeexplore.ieee.org/document/9509412 https://www.proquest.com/docview/2659346314
Volume	32
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF5qT3rwVcVqlT1406RJ9pGst1paitAetJXewmYfFyWVmoL4693dPPCFCCEEsgvLzCQzszvffABcxlIar4S5MV4mPWxcpJdkMTI3zTJKJWbSApynMzpZ4LslWbbAdYOFUUq54jPl20d3li9XYmO3yvrMdnuzlMJbJnErsVrNiQFOHJmYCRdCLzF-rAbIBKw_Hz48zk0qGIW-ZdWicfzFCTlWlR-_YudfxntgWq-sLCt58jdF5ov3b00b_7v0fbBbBZpwUFrGAWip_BDsfGo_2AG349FypoobaPk5Lbwjl3D0VqxLrAOclRXi0IS10O31w0H54r4uOlrlR2AxHs2HE6_iVPBExEjhhYlMVMIR04JzLpCONCIM28CQY6LjBClGOFdaUh3FJheRkgSB0krHOAsQQsegna9ydQJggDQyV8YYEVgoymTEkWACa6YJ5lkXhLWQU1E1HLe8F8-pSzwCljrFpFYxaaWYLrhq5ryU7Tb-HN2xkm5GVkLugl6ty7T6Il_TiBKGMEUhPv191hnYjiy0wRUz9kC7WG_UuQk4iuzCWdoH8XDRMA
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEB5ED-rBt7i6ag_etGvbJG3jTWWX9bE9aJW9lTSPi9KVtQvirzdJH_hChFIKTSDMTDszyXzzARxFQmivhJk2XipcrF2kG-cR0jdF8zAUmAoDcB4l4fABX4_JeA5OWiyMlNIWn8meebRn-WLCZ2ar7JSabm-GUnhB-33iV2it9swAx5ZOTAcMvhtrT9ZAZDx6ml7eP6Y6GQz8nuHVCqPoixuyvCo_fsbWwwxWYdSsrSoseerNyrzH37-1bfzv4tdgpQ41nfPKNtZhThYbsPypAeEmXAz640SWZ45h6DQAj0I4_bdyWqEdnKSqEXd0YOvY3X7nvHpx15QdTYoteBj008uhW7MquDygpHT9WMQyZogqzhjjSAUKEYpNaMgwUVGMJCWMSSVCFUQ6GxGCeJ5UUkU49xBC2zBfTAq5A46HFNJXTinhmMuQioAhTjlWVBHM8g74jZAzXrccN8wXz5lNPTyaWcVkRjFZrZgOHLdzXqqGG3-O3jSSbkfWQu5At9FlVn-Tr1kQEopwiHy8-_usQ1gcpqPb7PYqudmDpcAAHWxpYxfmy-lM7uvwo8wPrNV9AAb61Hk
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=FEXNet%3A+Foreground+Extraction+Network+for+Human+Action+Recognition&rft.jtitle=IEEE+transactions+on+circuits+and+systems+for+video+technology&rft.au=Shen%2C+Zhongwei&rft.au=Wu%2C+Xiao-Jun&rft.au=Xu%2C+Tianyang&rft.date=2022-05-01&rft.pub=IEEE&rft.issn=1051-8215&rft.volume=32&rft.issue=5&rft.spage=3141&rft.epage=3151&rft_id=info:doi/10.1109%2FTCSVT.2021.3103677&rft.externalDocID=9509412
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1051-8215&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1051-8215&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1051-8215&client=summon