MVF-Net: A Multi-View Fusion Network for Event-Based Object Classification
Event-based object recognition has drawn increasing attention for event cameras' distinguished advantages of low power consumption and high dynamic range. For this new modality, previous works based on customizing low-level descriptors are vulnerable to noise and with limited generalizability....
Saved in:
Published in | IEEE transactions on circuits and systems for video technology Vol. 32; no. 12; pp. 8275 - 8284 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
New York
IEEE
01.12.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Event-based object recognition has drawn increasing attention for event cameras' distinguished advantages of low power consumption and high dynamic range. For this new modality, previous works based on customizing low-level descriptors are vulnerable to noise and with limited generalizability. Although recent works turn to design various deep neural networks to extract event features, they either suffer from data insufficiency to fully train the event-based model or fail to encode spatial and temporal cues simultaneously with their single view network. In this work, we address these limitations by proposing a multi-view attention-aware network, in which an event stream is projected to multi-view 2D maps to utilize well-trained 2D models and explore spatio-temporal complements. Besides, the attention mechanism is used to boost the complements in different streams for better joint inference. Comprehensive experiments show the large superiority of our model over state-of-the-art methods as well as the efficacy of our multi-view fusion framework for event data. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2021.3073673 |