FEXNet: Foreground Extraction Network for Human Action Recognition
As most human actions in video sequences embody the continuous interactions between foregrounds rather than the background scene, it is significant to disentangle these foregrounds from the background for advanced action recognition systems. In this paper, therefore, we propose a Foreground EXtracti...
Saved in:
Published in | IEEE transactions on circuits and systems for video technology Vol. 32; no. 5; pp. 3141 - 3151 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.05.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | As most human actions in video sequences embody the continuous interactions between foregrounds rather than the background scene, it is significant to disentangle these foregrounds from the background for advanced action recognition systems. In this paper, therefore, we propose a Foreground EXtraction (FEX) block to explicitly model the foreground clues to achieve effective management of action subjects. In particular, the designed FEX block contains two components. The first part is a Foreground Enhancement (FE) module, which highlights the potential feature channels related to the action attributes, providing channel-level refinement for the following spatiotemporal modeling. The second phase is a Scene Segregation (SS) module, which splits feature maps into foreground and background. Specifically, a temporal model with dynamic enhancement is constructed for the foreground part, reflecting the essential nature of the action category. While the background is modeled using simple spatial convolutions, mapping the inputs to the consistent feature space. The FEX blocks can be inserted into existing 2D CNNs (denoted as FEXNet) for spatiotemporal modeling, concentrating on the foreground clues for effective action inference. Our experiments performed on Something-Something V1, V2 and Kinetics400 verify the effectiveness of the proposed method. |
---|---|
AbstractList | As most human actions in video sequences embody the continuous interactions between foregrounds rather than the background scene, it is significant to disentangle these foregrounds from the background for advanced action recognition systems. In this paper, therefore, we propose a Foreground EXtraction (FEX) block to explicitly model the foreground clues to achieve effective management of action subjects. In particular, the designed FEX block contains two components. The first part is a Foreground Enhancement (FE) module, which highlights the potential feature channels related to the action attributes, providing channel-level refinement for the following spatiotemporal modeling. The second phase is a Scene Segregation (SS) module, which splits feature maps into foreground and background. Specifically, a temporal model with dynamic enhancement is constructed for the foreground part, reflecting the essential nature of the action category. While the background is modeled using simple spatial convolutions, mapping the inputs to the consistent feature space. The FEX blocks can be inserted into existing 2D CNNs (denoted as FEXNet) for spatiotemporal modeling, concentrating on the foreground clues for effective action inference. Our experiments performed on Something-Something V1, V2 and Kinetics400 verify the effectiveness of the proposed method. |
Author | Wu, Xiao-Jun Xu, Tianyang Shen, Zhongwei |
Author_xml | – sequence: 1 givenname: Zhongwei orcidid: 0000-0002-6701-1965 surname: Shen fullname: Shen, Zhongwei email: shenzw_cv@163.com organization: School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China – sequence: 2 givenname: Xiao-Jun orcidid: 0000-0002-0310-5778 surname: Wu fullname: Wu, Xiao-Jun email: wu_xiaojun@jiangnan.edu.cn organization: School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China – sequence: 3 givenname: Tianyang orcidid: 0000-0002-9015-3128 surname: Xu fullname: Xu, Tianyang email: tianyang_xu@163.com organization: School of Artificial Intelligence and Computer Science, Jiangnan University, Wuxi, China |
BookMark | eNp9kE9Lw0AQxRdRsK1-Ab0EPKfu7J9k11strRWKglbxtqzJbElts3WToH57E1s8ePA0D957M8yvTw5LXyIhZ0CHAFRfLsaPz4showyGHChP0vSA9EBKFTNG5WGrqYRYMZDHpF9VK0pBKJH2yPV08nKH9VU09QGXwTdlHk0-62CzuvBl1FofPrxFzodo1mxsGY12xgNmflkWnT4hR86uKzzdzwF5mk4W41k8v7-5HY_mcca0rGNQuUJluXaZtTbjjjkutaAA0grpUsVRS2vR5YljaZqoPJeUokOXilfKOR-Qi93ebfDvDVa1WfkmlO1JwxKpuUg4iDbFdqks-KoK6Mw2FBsbvgxQ07EyP6xMx8rsWbUl9aeUFbXtnmtJFOv_q-e7aoGIv7e0pFoA49_fUXkQ |
CODEN | ITCTEM |
CitedBy_id | crossref_primary_10_1109_TCSVT_2023_3262670 crossref_primary_10_1109_JSEN_2024_3380321 crossref_primary_10_1109_TCSVT_2024_3390133 crossref_primary_10_1109_TMM_2022_3224327 crossref_primary_10_1016_j_engappai_2024_108247 crossref_primary_10_1109_TCSVT_2022_3207518 crossref_primary_10_1109_TGRS_2023_3276894 crossref_primary_10_1109_JSEN_2024_3363042 crossref_primary_10_1109_JSEN_2023_3303912 crossref_primary_10_1109_TCSVT_2022_3221280 crossref_primary_10_1049_ell2_13215 crossref_primary_10_1109_TCSVT_2022_3207174 crossref_primary_10_1109_TCSVT_2023_3321977 crossref_primary_10_3390_math10183290 crossref_primary_10_1109_TCSVT_2023_3319140 crossref_primary_10_1109_TCSVT_2023_3274108 crossref_primary_10_1007_s10489_024_05408_y crossref_primary_10_1109_JIOT_2023_3293506 crossref_primary_10_1109_TCSVT_2024_3395636 crossref_primary_10_1007_s11276_023_03267_y crossref_primary_10_1016_j_displa_2023_102569 crossref_primary_10_1109_TCSVT_2022_3190450 crossref_primary_10_1109_TCSVT_2023_3276979 crossref_primary_10_3390_sym15122177 crossref_primary_10_1016_j_icte_2023_12_004 |
Cites_doi | 10.1109/CVPR.2016.90 10.1109/TCSVT.2019.2962229 10.1109/TCSVT.2019.2945068 10.1109/ICCV.2017.622 10.1109/ICCV.2019.00718 10.1007/978-3-030-01228-1_25 10.1109/ICCV.2019.00630 10.1109/CVPR.2015.7299059 10.1007/s11263-021-01435-1 10.1109/TIP.2019.2919201 10.1109/TCSVT.2017.2719043 10.1109/CVPR.2015.7298789 10.1109/TPAMI.2016.2599174 10.1109/CVPR42600.2020.00043 10.1109/ICCV.2013.441 10.1109/CVPR.2018.00817 10.1109/CVPR.2009.5206771 10.1007/978-3-030-01225-0_18 10.1109/CVPR.2018.00675 10.1109/CVPR.2017.502 10.1007/978-3-030-58539-6_17 10.1109/ICCV.2019.00209 10.1007/s11263-012-0594-8 10.1109/CVPR.2017.113 10.1609/aaai.v34i07.6836 10.1109/TCSVT.2017.2665359 10.1109/CVPR.2018.00151 10.1109/TCSVT.2020.2984569 10.1109/CVPR.2018.00054 10.1109/CVPR42600.2020.00067 10.1109/ICCV.2017.74 10.1109/TPAMI.2019.2938758 10.1007/978-3-319-46484-8_2 10.1109/TCSVT.2020.3006223 10.1109/ICCV.2017.590 10.1109/CVPR.2016.106 10.1109/TCSVT.2018.2808685 10.1609/aaai.v33i01.33018401 10.24963/ijcai.2018/109 10.1109/ICCV.2015.510 10.1007/978-3-030-01267-0_19 10.1609/aaai.v32i1.12333 10.1007/978-3-030-01216-8_43 10.1109/CVPR.2018.00630 10.1109/CVPR42600.2020.00028 10.1017/9781108924238.008 10.1109/CVPR.2009.5206848 10.1109/5.726791 10.1109/CVPR.2018.00813 10.1109/CVPR.2015.7298872 10.1007/978-3-030-01246-5_49 10.1007/s11263-006-0009-9 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
DBID | 97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D |
DOI | 10.1109/TCSVT.2021.3103677 |
DatabaseName | IEEE Xplore (IEEE) IEEE All-Society Periodicals Package (ASPP) 1998-Present IEEE Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Technology Research Database |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 1558-2205 |
EndPage | 3151 |
ExternalDocumentID | 10_1109_TCSVT_2021_3103677 9509412 |
Genre | orig-research |
GrantInformation_xml | – fundername: National Key Research and Development Program of China grantid: 2017YFC1601800 funderid: 10.13039/501100012166 – fundername: National Natural Science Foundation of China grantid: U1836218; 62020106012; 61672265 funderid: 10.13039/501100001809 – fundername: 111 Project of Ministry of Education of China grantid: B12018 |
GroupedDBID | -~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS RXW TAE TN5 VH1 AAYXX CITATION RIG 7SC 7SP 8FD JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c295t-18d8e8a39fcaaac3f2f35940115a45f783e95aaefd6f27768dd500efef74b0333 |
IEDL.DBID | RIE |
ISSN | 1051-8215 |
IngestDate | Mon Jun 30 06:54:18 EDT 2025 Tue Jul 01 00:41:16 EDT 2025 Thu Apr 24 22:54:15 EDT 2025 Wed Aug 27 02:40:12 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 5 |
Language | English |
License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c295t-18d8e8a39fcaaac3f2f35940115a45f783e95aaefd6f27768dd500efef74b0333 |
Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ORCID | 0000-0002-6701-1965 0000-0002-9015-3128 0000-0002-0310-5778 |
PQID | 2659346314 |
PQPubID | 85433 |
PageCount | 11 |
ParticipantIDs | crossref_primary_10_1109_TCSVT_2021_3103677 proquest_journals_2659346314 crossref_citationtrail_10_1109_TCSVT_2021_3103677 ieee_primary_9509412 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2022-05-01 |
PublicationDateYYYYMMDD | 2022-05-01 |
PublicationDate_xml | – month: 05 year: 2022 text: 2022-05-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | New York |
PublicationPlace_xml | – name: New York |
PublicationTitle | IEEE transactions on circuits and systems for video technology |
PublicationTitleAbbrev | TCSVT |
PublicationYear | 2022 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | ref13 Soomro (ref15) 2012 ref57 ref12 ref56 ref58 ref53 ref52 ref11 ref55 ref10 ref17 ref16 ref19 ref18 Liu (ref54) 2020 ref51 ref50 Kay (ref14) 2017 ref46 ref45 ref48 ref47 ref42 ref41 ref44 ref43 Tran (ref36) 2017 ref49 ref8 ref7 ref9 ref4 ref6 ref5 ref40 Simonyan (ref25) ref35 ref34 ref37 ref31 ref30 ref33 ref32 ref2 ref1 ref39 ref38 ref24 ref23 Chiang (ref3) 2013 ref26 ref20 ref22 ref21 ref28 ref27 ref29 |
References_xml | – ident: ref49 doi: 10.1109/CVPR.2016.90 – ident: ref1 doi: 10.1109/TCSVT.2019.2962229 – ident: ref13 doi: 10.1109/TCSVT.2019.2945068 – volume-title: arXiv:2005.06803 year: 2020 ident: ref54 article-title: TAM: Temporal adaptive module for video recognition – ident: ref16 doi: 10.1109/ICCV.2017.622 – ident: ref19 doi: 10.1109/ICCV.2019.00718 – volume-title: arXiv:1705.06950 year: 2017 ident: ref14 article-title: The kinetics human action video dataset – ident: ref53 doi: 10.1007/978-3-030-01228-1_25 – ident: ref46 doi: 10.1109/ICCV.2019.00630 – ident: ref23 doi: 10.1109/CVPR.2015.7299059 – volume-title: Assistant driving system with video recognition year: 2013 ident: ref3 – ident: ref9 doi: 10.1007/s11263-021-01435-1 – ident: ref11 doi: 10.1109/TIP.2019.2919201 – start-page: 568 volume-title: Proc. NIPS ident: ref25 article-title: Two-stream convolutional networks for action recognition in videos – ident: ref2 doi: 10.1109/TCSVT.2017.2719043 – ident: ref7 doi: 10.1109/CVPR.2015.7298789 – ident: ref30 doi: 10.1109/TPAMI.2016.2599174 – ident: ref55 doi: 10.1109/CVPR42600.2020.00043 – ident: ref22 doi: 10.1109/ICCV.2013.441 – ident: ref34 doi: 10.1109/CVPR.2018.00817 – ident: ref5 doi: 10.1109/CVPR.2009.5206771 – ident: ref41 doi: 10.1007/978-3-030-01225-0_18 – ident: ref38 doi: 10.1109/CVPR.2018.00675 – ident: ref37 doi: 10.1109/CVPR.2017.502 – ident: ref57 doi: 10.1007/978-3-030-58539-6_17 – ident: ref48 doi: 10.1109/ICCV.2019.00209 – ident: ref21 doi: 10.1007/s11263-012-0594-8 – ident: ref10 doi: 10.1109/CVPR.2017.113 – ident: ref20 doi: 10.1609/aaai.v34i07.6836 – ident: ref24 doi: 10.1109/TCSVT.2017.2665359 – ident: ref27 doi: 10.1109/CVPR.2018.00151 – ident: ref43 doi: 10.1109/TCSVT.2020.2984569 – ident: ref45 doi: 10.1109/CVPR.2018.00054 – ident: ref56 doi: 10.1109/CVPR42600.2020.00067 – ident: ref58 doi: 10.1109/ICCV.2017.74 – ident: ref18 doi: 10.1109/TPAMI.2019.2938758 – ident: ref31 doi: 10.1007/978-3-319-46484-8_2 – ident: ref29 doi: 10.1109/TCSVT.2020.3006223 – ident: ref40 doi: 10.1109/ICCV.2017.590 – ident: ref12 doi: 10.1109/CVPR.2016.106 – ident: ref26 doi: 10.1109/TCSVT.2018.2808685 – ident: ref33 doi: 10.1609/aaai.v33i01.33018401 – ident: ref6 doi: 10.24963/ijcai.2018/109 – ident: ref35 doi: 10.1109/ICCV.2015.510 – ident: ref39 doi: 10.1007/978-3-030-01267-0_19 – volume-title: arXiv:1708.05038 year: 2017 ident: ref36 article-title: ConvNet architecture search for spatiotemporal feature learning – ident: ref42 doi: 10.1609/aaai.v32i1.12333 – ident: ref44 doi: 10.1007/978-3-030-01216-8_43 – ident: ref28 doi: 10.1109/CVPR.2018.00630 – ident: ref47 doi: 10.1109/CVPR42600.2020.00028 – ident: ref50 doi: 10.1017/9781108924238.008 – ident: ref51 doi: 10.1109/CVPR.2009.5206848 – ident: ref17 doi: 10.1109/5.726791 – ident: ref52 doi: 10.1109/CVPR.2018.00813 – ident: ref8 doi: 10.1109/CVPR.2015.7298872 – volume-title: arXiv:1212.0402 year: 2012 ident: ref15 article-title: UCF101: A dataset of 101 human actions classes from videos in the wild – ident: ref32 doi: 10.1007/978-3-030-01246-5_49 – ident: ref4 doi: 10.1007/s11263-006-0009-9 |
SSID | ssj0014847 |
Score | 2.5109057 |
Snippet | As most human actions in video sequences embody the continuous interactions between foregrounds rather than the background scene, it is significant to... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 3141 |
SubjectTerms | action recognition Convolutional neural networks Feature extraction Feature maps Foreground-related features Human activity recognition Human motion Image recognition Iron Modelling Modules Solid modeling spatiotemporal modeling Spatiotemporal phenomena Three-dimensional displays Two dimensional models |
Title | FEXNet: Foreground Extraction Network for Human Action Recognition |
URI | https://ieeexplore.ieee.org/document/9509412 https://www.proquest.com/docview/2659346314 |
Volume | 32 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8NAEF5qT3rwVcVqlT1406RJ9pGst1paitAetJXewmYfFyWVmoL4693dPPCFCCEEsgvLzCQzszvffABcxlIar4S5MV4mPWxcpJdkMTI3zTJKJWbSApynMzpZ4LslWbbAdYOFUUq54jPl20d3li9XYmO3yvrMdnuzlMJbJnErsVrNiQFOHJmYCRdCLzF-rAbIBKw_Hz48zk0qGIW-ZdWicfzFCTlWlR-_YudfxntgWq-sLCt58jdF5ov3b00b_7v0fbBbBZpwUFrGAWip_BDsfGo_2AG349FypoobaPk5Lbwjl3D0VqxLrAOclRXi0IS10O31w0H54r4uOlrlR2AxHs2HE6_iVPBExEjhhYlMVMIR04JzLpCONCIM28CQY6LjBClGOFdaUh3FJheRkgSB0krHOAsQQsegna9ydQJggDQyV8YYEVgoymTEkWACa6YJ5lkXhLWQU1E1HLe8F8-pSzwCljrFpFYxaaWYLrhq5ryU7Tb-HN2xkm5GVkLugl6ty7T6Il_TiBKGMEUhPv191hnYjiy0wRUz9kC7WG_UuQk4iuzCWdoH8XDRMA |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEB5ED-rBt7i6ag_etGvbJG3jTWWX9bE9aJW9lTSPi9KVtQvirzdJH_hChFIKTSDMTDszyXzzARxFQmivhJk2XipcrF2kG-cR0jdF8zAUmAoDcB4l4fABX4_JeA5OWiyMlNIWn8meebRn-WLCZ2ar7JSabm-GUnhB-33iV2it9swAx5ZOTAcMvhtrT9ZAZDx6ml7eP6Y6GQz8nuHVCqPoixuyvCo_fsbWwwxWYdSsrSoseerNyrzH37-1bfzv4tdgpQ41nfPKNtZhThYbsPypAeEmXAz640SWZ45h6DQAj0I4_bdyWqEdnKSqEXd0YOvY3X7nvHpx15QdTYoteBj008uhW7MquDygpHT9WMQyZogqzhjjSAUKEYpNaMgwUVGMJCWMSSVCFUQ6GxGCeJ5UUkU49xBC2zBfTAq5A46HFNJXTinhmMuQioAhTjlWVBHM8g74jZAzXrccN8wXz5lNPTyaWcVkRjFZrZgOHLdzXqqGG3-O3jSSbkfWQu5At9FlVn-Tr1kQEopwiHy8-_usQ1gcpqPb7PYqudmDpcAAHWxpYxfmy-lM7uvwo8wPrNV9AAb61Hk |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=FEXNet%3A+Foreground+Extraction+Network+for+Human+Action+Recognition&rft.jtitle=IEEE+transactions+on+circuits+and+systems+for+video+technology&rft.au=Shen%2C+Zhongwei&rft.au=Wu%2C+Xiao-Jun&rft.au=Xu%2C+Tianyang&rft.date=2022-05-01&rft.pub=IEEE&rft.issn=1051-8215&rft.volume=32&rft.issue=5&rft.spage=3141&rft.epage=3151&rft_id=info:doi/10.1109%2FTCSVT.2021.3103677&rft.externalDocID=9509412 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1051-8215&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1051-8215&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1051-8215&client=summon |