ECSNet: Spatio-Temporal Feature Learning for Event Camera

The neuromorphic event cameras can efficiently sense the latent geometric structures and motion clues of a scene by generating asynchronous and sparse event signals. Due to the irregular layout of the event signals, how to leverage their plentiful spatio-temporal information for recognition tasks re...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 33; no. 2; pp. 701 - 712
Main Authors	Chen, Zhiwen, Wu, Jinjian, Hou, Junhui, Li, Leida, Dong, Weisheng, Shi, Guangming
Format	Journal Article
Language	English
Published	New York IEEE 01.02.2023 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	action recognition Brightness Cameras Cloud computing Data mining Event camera Feature extraction Machine learning Modules Moving object recognition object classification Representation learning Representations Robustness Sparsity spatio-temporal feature learning Task analysis Weight reduction
Online Access	Get full text

Cover

Loading…

Abstract	The neuromorphic event cameras can efficiently sense the latent geometric structures and motion clues of a scene by generating asynchronous and sparse event signals. Due to the irregular layout of the event signals, how to leverage their plentiful spatio-temporal information for recognition tasks remains a significant challenge. Existing methods tend to treat events as dense image-like or point-serie representations. However, they either suffer from severe destruction on the sparsity of event data or fail to encode robust spatial cues. To fully exploit their inherent sparsity with reconciling the spatio-temporal information, we introduce a compact event representation, namely 2D-1T event cloud sequence (2D-1T ECS). We couple this representation with a novel light-weight spatio-temporal learning framework (ECSNet) that accommodates both object classification and action recognition tasks. The core of our framework is a hierarchical spatial relation module. Equipped with specially designed surface-event-based sampling unit and local event normalization unit to enhance the inter-event relation encoding, this module learns robust geometric features from the 2D event clouds. And we propose a motion attention module for efficiently capturing long-term temporal context evolving with the 1T cloud sequence. Empirically, the experiments show that our framework achieves par or even better state-of-the-art performance. Importantly, our approach cooperates well with the sparsity of event data without any sophisticated operations, hence leading to low computational costs and prominent inference speeds.
AbstractList	The neuromorphic event cameras can efficiently sense the latent geometric structures and motion clues of a scene by generating asynchronous and sparse event signals. Due to the irregular layout of the event signals, how to leverage their plentiful spatio-temporal information for recognition tasks remains a significant challenge. Existing methods tend to treat events as dense image-like or point-serie representations. However, they either suffer from severe destruction on the sparsity of event data or fail to encode robust spatial cues. To fully exploit their inherent sparsity with reconciling the spatio-temporal information, we introduce a compact event representation, namely 2D-1T event cloud sequence (2D-1T ECS). We couple this representation with a novel light-weight spatio-temporal learning framework (ECSNet) that accommodates both object classification and action recognition tasks. The core of our framework is a hierarchical spatial relation module. Equipped with specially designed surface-event-based sampling unit and local event normalization unit to enhance the inter-event relation encoding, this module learns robust geometric features from the 2D event clouds. And we propose a motion attention module for efficiently capturing long-term temporal context evolving with the 1T cloud sequence. Empirically, the experiments show that our framework achieves par or even better state-of-the-art performance. Importantly, our approach cooperates well with the sparsity of event data without any sophisticated operations, hence leading to low computational costs and prominent inference speeds.
Author	Shi, Guangming Wu, Jinjian Dong, Weisheng Li, Leida Hou, Junhui Chen, Zhiwen
Author_xml	– sequence: 1 givenname: Zhiwen surname: Chen fullname: Chen, Zhiwen organization: School of Artificial Intelligence, Xidian University, Xi'an, China – sequence: 2 givenname: Jinjian orcidid: 0000-0001-7501-0009 surname: Wu fullname: Wu, Jinjian email: jinjian.wu@mail.xidian.edu.cn organization: School of Artificial Intelligence, Xidian University, Xi'an, China – sequence: 3 givenname: Junhui orcidid: 0000-0003-3431-2021 surname: Hou fullname: Hou, Junhui email: jh.hou@cityu.edu.hk organization: Department of Computer Science, City University of Hong Kong, Hong Kong, China – sequence: 4 givenname: Leida orcidid: 0000-0001-9069-8796 surname: Li fullname: Li, Leida organization: School of Artificial Intelligence, Xidian University, Xi'an, China – sequence: 5 givenname: Weisheng orcidid: 0000-0002-9632-985X surname: Dong fullname: Dong, Weisheng organization: School of Artificial Intelligence, Xidian University, Xi'an, China – sequence: 6 givenname: Guangming orcidid: 0000-0003-2179-3292 surname: Shi fullname: Shi, Guangming organization: School of Artificial Intelligence, Xidian University, Xi'an, China
BookMark	eNp9kE1PwzAMhiM0JLbBH4BLJc4dSdp8cUPVBkgTHFa4RmnmoE5bWtIMiX9PxhAHDlxsS_Zjv34naOQ7DwhdEjwjBKubulq91jOKKZ0VKXKmTtCYMCZzSjEbpRozkktK2BmaDMMGY1LKUoyRmlerJ4i32ao3se3yGnZ9F8w2W4CJ-wDZEkzwrX_LXBey-Qf4mFVmB8Gco1NntgNc_OQpelnM6-ohXz7fP1Z3y9xSxWIurJUlrIFwSYzBIkkUpbTS2aZpUsNIcFiCFIQ4vObM8sZKjpuklTpMcDFF18e9feje9zBEven2waeTmgpRlKVQiqQpeZyyoRuGAE7bNh4-8jGYdqsJ1gej9LdR-mCU_jEqofQP2od2Z8Ln_9DVEWoB4BdQkivOePEFBCt03w
CODEN	ITCTEM
CitedBy_id	crossref_primary_10_1109_TCSVT_2024_3482436 crossref_primary_10_1016_j_jmsy_2024_09_013 crossref_primary_10_1016_j_neunet_2024_106493 crossref_primary_10_1109_JAS_2024_124470 crossref_primary_10_1109_TCSVT_2023_3326294 crossref_primary_10_1109_TCSVT_2023_3272375 crossref_primary_10_1109_TCSVT_2023_3317976 crossref_primary_10_1109_TCSVT_2024_3495769 crossref_primary_10_1109_TCSVT_2023_3249195 crossref_primary_10_1007_s10489_024_05982_1 crossref_primary_10_1109_TIFS_2024_3409167 crossref_primary_10_1016_j_neucom_2025_129776 crossref_primary_10_1109_TCSVT_2024_3448615 crossref_primary_10_1109_TCSVT_2023_3301176 crossref_primary_10_1109_JSEN_2024_3524301 crossref_primary_10_1016_j_eswa_2024_126255
Cites_doi	10.1109/CVPR.2019.00398 10.1109/CVPR.2018.00186 10.1109/ICASSP.2019.8683606 10.1109/WACV51458.2022.00073 10.1109/CVPR.2018.00685 10.1109/TMM.2020.2965434 10.1109/TPAMI.2020.3008413 10.1109/TCSVT.2018.2841516 10.1109/CVPR.2019.00108 10.1109/ICCV.2019.00573 10.1109/TC.2021.3119180 10.1109/TPAMI.2019.2919301 10.1109/CVPR.2019.00344 10.1109/TIP.2020.3023597 10.1007/978-3-030-58565-5_9 10.1109/CVPR.2018.00568 10.1109/CVPR.2016.90 10.1109/JSSC.2010.2085952 10.1109/jssc.2007.914337 10.1109/LRA.2020.3002480 10.3389/fnins.2015.00437 10.1109/tcsvt.2021.3073673 10.3389/fnins.2017.00309 10.1109/CVPR42600.2020.01112 10.1109/CVPR42600.2020.00580 10.3389/fnins.2015.00481 10.1109/CVPR.2019.00401 10.1109/ICCV.2019.00058 10.1109/ICACI49185.2020.9177628 10.1109/ICECS.2018.8617982 10.1109/CVPR.2017.502 10.1109/tpami.2022.3161735 10.1109/TPAMI.2015.2392947 10.1109/WACV.2019.00199 10.1109/CVPR.2017.781 10.1109/ICCV.2015.510 10.1109/CVPR.2018.00675 10.1007/s11263-014-0788-3 10.1109/ISCAS45731.2020.9181247 10.1109/TPAMI.2016.2574707 10.1109/tcsvt.2022.3156653 10.1109/TIP.2021.3077136 10.1109/TCSVT.2020.3044287 10.1109/CVPR46437.2021.01398
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023
DBID	97E RIA RIE AAYXX CITATION 7SC 7SP 8FD JQ2 L7M L~C L~D
DOI	10.1109/TCSVT.2022.3202659
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE Xplore CrossRef Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	CrossRef Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional
DatabaseTitleList	Technology Research Database
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	1558-2205
EndPage	712
ExternalDocumentID	10_1109_TCSVT_2022_3202659 9869656
Genre	orig-research
GrantInformation_xml	– fundername: National Natural Science Foundation of China grantid: 62022063 funderid: 10.13039/501100001809
GroupedDBID	-~X 0R~ 29I 4.4 5GY 5VS 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK AENEX AETIX AGQYO AGSQL AHBIQ AI. AIBXA AKJIK AKQYR ALLEH ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 EBS EJD HZ~ H~9 ICLAB IFIPE IFJZH IPLJI JAVBF LAI M43 O9- OCL P2P RIA RIE RNS RXW TAE TN5 VH1 AAYXX CITATION RIG 7SC 7SP 8FD JQ2 L7M L~C L~D
ID	FETCH-LOGICAL-c295t-7cc84ede1681aa07022748c8fcbbb4eda8ef08e8711f0d65c6bc860b8212f0103
IEDL.DBID	RIE
ISSN	1051-8215
IngestDate	Mon Jun 30 03:45:05 EDT 2025 Tue Jul 01 00:41:18 EDT 2025 Thu Apr 24 22:57:33 EDT 2025 Wed Aug 27 02:48:19 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	2
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c295t-7cc84ede1681aa07022748c8fcbbb4eda8ef08e8711f0d65c6bc860b8212f0103
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ORCID	0000-0001-9069-8796 0000-0003-2179-3292 0000-0001-7501-0009 0000-0003-3431-2021 0000-0002-9632-985X
PQID	2773447991
PQPubID	85433
PageCount	12
ParticipantIDs	proquest_journals_2773447991 crossref_primary_10_1109_TCSVT_2022_3202659 crossref_citationtrail_10_1109_TCSVT_2022_3202659 ieee_primary_9869656
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2023-02-01
PublicationDateYYYYMMDD	2023-02-01
PublicationDate_xml	– month: 02 year: 2023 text: 2023-02-01 day: 01
PublicationDecade	2020
PublicationPlace	New York
PublicationPlace_xml	– name: New York
PublicationTitle	IEEE transactions on circuits and systems for video technology
PublicationTitleAbbrev	TCSVT
PublicationYear	2023
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	ref13 ref15 ref14 Vaswani (ref40); 30 ref11 ref10 ref17 ref16 ref19 ref18 ref51 ref50 ref46 ref45 ref48 ref47 ref42 Qi (ref31); 30 ref44 ref43 ref49 ref8 ref7 ref9 ref4 ref3 ref6 ref5 Lei Ba (ref34) 2016 ref35 ref37 ref30 ref32 Neil (ref12) ref2 ref1 ref39 ref38 Dosovitskiy (ref41) 2020 Ioffe (ref33) ref24 ref23 ref26 ref25 ref20 ref22 ref21 ref28 ref27 ref29 Bertasius (ref36); 2
References_xml	– ident: ref19 doi: 10.1109/CVPR.2019.00398 – ident: ref26 doi: 10.1109/CVPR.2018.00186 – volume: 2 start-page: 4 volume-title: Proc. ICML ident: ref36 article-title: Is space-time attention all you need for video understanding? – ident: ref18 doi: 10.1109/ICASSP.2019.8683606 – ident: ref38 doi: 10.1109/WACV51458.2022.00073 – ident: ref48 doi: 10.1109/CVPR.2018.00685 – ident: ref35 doi: 10.1109/TMM.2020.2965434 – ident: ref6 doi: 10.1109/TPAMI.2020.3008413 – ident: ref3 doi: 10.1109/TCSVT.2018.2841516 – ident: ref17 doi: 10.1109/CVPR.2019.00108 – ident: ref8 doi: 10.1109/ICCV.2019.00573 – start-page: 1 volume-title: Proc. NIPS ident: ref12 article-title: Phased LSTM: Accelerating recurrent network training for long or event-based sequences – ident: ref24 doi: 10.1109/TC.2021.3119180 – ident: ref27 doi: 10.1109/TPAMI.2019.2919301 – ident: ref29 doi: 10.1109/CVPR.2019.00344 – ident: ref16 doi: 10.1109/TIP.2020.3023597 – ident: ref9 doi: 10.1007/978-3-030-58565-5_9 – ident: ref7 doi: 10.1109/CVPR.2018.00568 – start-page: 448 volume-title: Proc. Int. Conf. Mach. Learn. ident: ref33 article-title: Batch normalization: Accelerating deep network training by reducing internal covariate shift – ident: ref51 doi: 10.1109/CVPR.2016.90 – ident: ref2 doi: 10.1109/JSSC.2010.2085952 – volume: 30 start-page: 1 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref31 article-title: PointNet++: Deep hierarchical feature learning on point sets in a metric space – ident: ref1 doi: 10.1109/jssc.2007.914337 – ident: ref20 doi: 10.1109/LRA.2020.3002480 – ident: ref42 doi: 10.3389/fnins.2015.00437 – ident: ref21 doi: 10.1109/tcsvt.2021.3073673 – year: 2020 ident: ref41 article-title: An image is worth 16×16 words: Transformers for image recognition at scale publication-title: arXiv:2010.11929 – ident: ref44 doi: 10.3389/fnins.2017.00309 – ident: ref32 doi: 10.1109/CVPR42600.2020.01112 – ident: ref28 doi: 10.1109/CVPR42600.2020.00580 – ident: ref43 doi: 10.3389/fnins.2015.00481 – volume: 30 start-page: 1 volume-title: Proc. Adv. Neural Inf. Process. Syst. ident: ref40 article-title: Attention is all you need – ident: ref13 doi: 10.1109/CVPR.2019.00401 – ident: ref30 doi: 10.1109/ICCV.2019.00058 – ident: ref23 doi: 10.1109/ICACI49185.2020.9177628 – ident: ref22 doi: 10.1109/ICECS.2018.8617982 – ident: ref46 doi: 10.1109/CVPR.2017.502 – ident: ref39 doi: 10.1109/tpami.2022.3161735 – ident: ref11 doi: 10.1109/TPAMI.2015.2392947 – ident: ref14 doi: 10.1109/WACV.2019.00199 – ident: ref45 doi: 10.1109/CVPR.2017.781 – ident: ref47 doi: 10.1109/ICCV.2015.510 – ident: ref49 doi: 10.1109/CVPR.2018.00675 – ident: ref10 doi: 10.1007/s11263-014-0788-3 – ident: ref15 doi: 10.1109/ISCAS45731.2020.9181247 – ident: ref25 doi: 10.1109/TPAMI.2016.2574707 – ident: ref5 doi: 10.1109/tcsvt.2022.3156653 – ident: ref50 doi: 10.1109/TIP.2021.3077136 – ident: ref4 doi: 10.1109/TCSVT.2020.3044287 – year: 2016 ident: ref34 article-title: Layer normalization publication-title: arXiv:1607.06450 – ident: ref37 doi: 10.1109/CVPR46437.2021.01398
SSID	ssj0014847
Score	2.5642264
Snippet	The neuromorphic event cameras can efficiently sense the latent geometric structures and motion clues of a scene by generating asynchronous and sparse event...
SourceID	proquest crossref ieee
SourceType	Aggregation Database Enrichment Source Index Database Publisher
StartPage	701
SubjectTerms	action recognition Brightness Cameras Cloud computing Data mining Event camera Feature extraction Machine learning Modules Moving object recognition object classification Representation learning Representations Robustness Sparsity spatio-temporal feature learning Task analysis Weight reduction
Title	ECSNet: Spatio-Temporal Feature Learning for Event Camera
URI	https://ieeexplore.ieee.org/document/9869656 https://www.proquest.com/docview/2773447991
Volume	33
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFH4BTnrwFxpRNDt408Fat9J5MwRCTOACGG5L27150IDB7eJfb183CFFjvC1ZmzSvP9732ve-D-AGMULJROwbhtwPVaZ8xZBSoKQij8p4SvXO44kYzcOnRbSowd22FgYRXfIZdujTveWnK1PQVVk3liK2-KMOdRu4lbVa2xeDUDoxMQsXmC-tH9sUyARxd9afPs9sKMh5h9TCBfGS7jghp6ry4yh2_mV4COPNyMq0ktdOkeuO-fxG2vjfoR_BQQU0vcdyZRxDDZcnsL9DP9iEeNCfTjB_8KYurdqflTRVbx7hwmKNXkW--uJZZOsNKDPS6yu6xTqF-XAw64_8SkrBNzyOcr9njAwxRSYkU8puc26jUWlkZrTW9oeSmAUSbfTEsiAVkRHaSBFoa0mekRTEGTSWqyWeg4c2BhGZ1ipMMVQ9e0qhIWLFe4yFiXjWAraxbWIqnnGSu3hLXLwRxImbj4TmI6nmowW32z7vJcvGn62bZOBty8q2LWhvpjCpNuJHwns94jS0KPji916XsEcK8mUidhsa-brAK4szcn3tFtgXU0_NZg
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1LT9wwEB4BPQCHtrzE8ig-tCeUJfYmXgepB7RdtLz2sgFxC7Yz4QBa0G5WqP0t_Sv8N8ZOdoVo1RtSb5FiJ5ZnZH9jf_MNwFfEGBWXSWA5iiDShQ40R0eBUtrtqFzkLt_5oi97l9HpdXw9B79nuTCI6Mln2HSP_i4_f7ATd1R2kCiZEP6oKZRn-POJArTx95MfZM1vQhx3004vqGsIBFYkcRm0rVUR5sil4lqTfwsKw5RVhTXG0AutsAgVUtjAizCXsZXGKhkaRaMsXA0E-u48fCCcEYsqO2x2RxEpX76MAAoPqHk8TckJk4O0M7hKKfgUounqk0unhPpq2_N1XP5Y_P2OdvwJnqdzURFZ7pqT0jTtrzcykf_rZH2GjzWUZkeV76_AHA5XYfmVwOIaJN3OoI_lIRt44niQVkJc98wh38kIWS0ve8sIu7Ou436yjnbndOtw-S5j34CF4cMQN4EhRVmyMEZHOUa6TeswWicd2cJE2lgUDeBTW2a2VlJ3BT3uMx9RhUnm7Z85-2e1_RuwP-vzWOmI_LP1mjPorGVtywbsTF0mq5eacSbabafaSDh_6--99mCxl16cZ-cn_bNtWKI_tCra-Q4slKMJ7hKqKs0X79wMbt7bQV4A-Ngrog
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=ECSNet%3A+Spatio-Temporal+Feature+Learning+for+Event+Camera&rft.jtitle=IEEE+transactions+on+circuits+and+systems+for+video+technology&rft.au=Chen%2C+Zhiwen&rft.au=Wu%2C+Jinjian&rft.au=Hou%2C+Junhui&rft.au=Li%2C+Leida&rft.date=2023-02-01&rft.issn=1051-8215&rft.eissn=1558-2205&rft.volume=33&rft.issue=2&rft.spage=701&rft.epage=712&rft_id=info:doi/10.1109%2FTCSVT.2022.3202659&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TCSVT_2022_3202659
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1051-8215&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1051-8215&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1051-8215&client=summon