Flow-Guided Feature Aggregation for Video Object Detection
Extending state-of-the-art object detectors from image to video is challenging. The accuracy of detection suffers from degenerated object appearances in videos, e.g., motion blur, video defocus, rare poses, etc. Existing work attempts to exploit temporal information on box level, but such methods ar...
Saved in:
Published in | Proceedings / IEEE International Conference on Computer Vision pp. 408 - 417 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.10.2017
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Extending state-of-the-art object detectors from image to video is challenging. The accuracy of detection suffers from degenerated object appearances in videos, e.g., motion blur, video defocus, rare poses, etc. Existing work attempts to exploit temporal information on box level, but such methods are not trained end-to-end. We present flow-guided feature aggregation, an accurate and end-to-end learning framework for video object detection. It leverages temporal coherence on feature level instead. It improves the per-frame features by aggregation of nearby features along the motion paths, and thus improves the video recognition accuracy. Our method significantly improves upon strong singleframe baselines in ImageNet VID [33], especially for more challenging fast moving objects. Our framework is principled, and on par with the best engineered systems winning the ImageNet VID challenges 2016, without additional bells-and-whistles. The code would be released. |
---|---|
AbstractList | Extending state-of-the-art object detectors from image to video is challenging. The accuracy of detection suffers from degenerated object appearances in videos, e.g., motion blur, video defocus, rare poses, etc. Existing work attempts to exploit temporal information on box level, but such methods are not trained end-to-end. We present flow-guided feature aggregation, an accurate and end-to-end learning framework for video object detection. It leverages temporal coherence on feature level instead. It improves the per-frame features by aggregation of nearby features along the motion paths, and thus improves the video recognition accuracy. Our method significantly improves upon strong singleframe baselines in ImageNet VID [33], especially for more challenging fast moving objects. Our framework is principled, and on par with the best engineered systems winning the ImageNet VID challenges 2016, without additional bells-and-whistles. The code would be released. |
Author | Jifeng Dai Yichen Wei Lu Yuan Yujie Wang Xizhou Zhu |
Author_xml | – sequence: 1 surname: Xizhou Zhu fullname: Xizhou Zhu email: ezra0408@mail.ustc.edu.cn organization: Univ. of Sci. & Technol. of China, Hefei, China – sequence: 2 surname: Yujie Wang fullname: Yujie Wang email: v-yujiwa@microsoft.com – sequence: 3 surname: Jifeng Dai fullname: Jifeng Dai email: jifdai@microsoft.com – sequence: 4 surname: Lu Yuan fullname: Lu Yuan email: luyuan@microsoft.com – sequence: 5 surname: Yichen Wei fullname: Yichen Wei email: yichenw@microsoft.com |
BookMark | eNotjM1Kw0AURkdRsK3duXMzL5A4c-9kftyVaGqh0I12W2Ymd0JKTSRNEd_egK7O4TvwzdlN13fE2IMUuZTCPW3Kcp-DkCYv4IotnbGyQKulQHDXbAZoRWYKoe7Y_Hw-CoEOrJ6x5-rUf2frS1tTzSvy42UgvmqagRo_tn3HUz_w_VR7vgtHiiN_oXHClO7ZbfKnMy3_uWAf1et7-ZZtd-tNudpmLSg5ZsFoPwkh6NorAK-VMd5YFalQ2kvjbDQxBFckF-Q0a0wQU51U0jFExAV7_PttiejwNbSffvg5WECDUuEvL2ZHwg |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/ICCV.2017.52 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) - NZ IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE/IET Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Applied Sciences |
EISBN | 9781538610329 1538610329 |
EISSN | 2380-7504 |
EndPage | 417 |
ExternalDocumentID | 8237314 |
Genre | orig-research |
GroupedDBID | 29O 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IPLJI M43 OCL RIE RIL RIO RNS |
ID | FETCH-LOGICAL-i241t-b76a241e326da422a6477a784ce546a1798c7cbb95f9b184c63f2cfdf4f6cbc33 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 02:41:54 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i241t-b76a241e326da422a6477a784ce546a1798c7cbb95f9b184c63f2cfdf4f6cbc33 |
PageCount | 10 |
ParticipantIDs | ieee_primary_8237314 |
PublicationCentury | 2000 |
PublicationDate | 2017-10 |
PublicationDateYYYYMMDD | 2017-10-01 |
PublicationDate_xml | – month: 10 year: 2017 text: 2017-10 |
PublicationDecade | 2010 |
PublicationTitle | Proceedings / IEEE International Conference on Computer Vision |
PublicationTitleAbbrev | ICCV |
PublicationYear | 2017 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0039286 |
Score | 2.5810385 |
Snippet | Extending state-of-the-art object detectors from image to video is challenging. The accuracy of detection suffers from degenerated object appearances in... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 408 |
SubjectTerms | Detectors Feature extraction Object detection Optical imaging Target tracking Training |
Title | Flow-Guided Feature Aggregation for Video Object Detection |
URI | https://ieeexplore.ieee.org/document/8237314 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELVKJ6YCLeJbHhhxWhzHSdhQoRSkAgOtulX-uFQVVYJKIiR-Pb4kLQgxsFmOIls-Offu8t4dIeeoy1NCKKbtZcSEUoIhLGdCaGmUkZoDpgZGj3I4Fg_TYNogFxstDACU5DPwcFj-y7eZKTBV1sXCKj52rd5ygVul1Vp_dZ2bj-SG2B537_v9CRK3Qg8lRT8ap5R-Y9Aio_WKFV3k1Sty7ZnPX8UY_7ulHdL5VujR543v2SUNSPdIq4aUtL6w721yNVhmH-yuWFg3jXCvWAG9nrsge16ahDrMSifuaUafNKZk6A3kJTsr7ZDx4PalP2R1uwS2cG44ZzqUyg3AATKrBOcKNaYqjISBQEiFlclMaLSOgyTWLrAz0k-4SWwiEmm08f190kyzFA4IjYBD2FMQOgMK2zPuTa5loLh1cDCK1SFp41HM3qqKGLP6FI7-nj4m22iJigJ3Qpr5qoBT58pzfVba8Avk2J7w |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFH4heNCTP8D42x482oGl6zZvBkVQQA9AuJG26wjRbAa3mPjX27cNNMaDt6bLsqUv3ff17fveA7hAX57kXFIVXvmUS8kp0nLKuRJaaqGYwdTAYCi6Y_4wdacVuFx7YYwxufjMODjM_-WHic4wVdbAwiot7Fq9YXHfZYVba_XdtUDvi7W0PWj02u0JSrc8B01FP1qn5MjR2YbB6pmFYOTFyVLl6M9f5Rj_-1I7UP_26JHnNfrsQsXEe7BdkkpSbtn3Glx3XpMPep8tQjuNhC9bGnIzt8fseR4UYlkrmdirCXlSmJQhtybN9VlxHcadu1G7S8uGCXRhgTilyhPSDoylZKHkjEl0mUrP59q4XEisTaY9rVTgRoGyRzstWhHTURjxSGilW619qMZJbA6A-IYZrymNZ0PIw6a2dzIlXMlCSwj9QB5CDZdi9lbUxJiVq3D09_Q5bHZHg_6s3xs-HsMWRqUQxJ1ANV1m5tQCe6rO8nh-AWNuojo |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+IEEE+International+Conference+on+Computer+Vision&rft.atitle=Flow-Guided+Feature+Aggregation+for+Video+Object+Detection&rft.au=Xizhou+Zhu&rft.au=Yujie+Wang&rft.au=Jifeng+Dai&rft.au=Lu+Yuan&rft.date=2017-10-01&rft.pub=IEEE&rft.eissn=2380-7504&rft.spage=408&rft.epage=417&rft_id=info:doi/10.1109%2FICCV.2017.52&rft.externalDocID=8237314 |