Flow-Guided Feature Aggregation for Video Object Detection

Extending state-of-the-art object detectors from image to video is challenging. The accuracy of detection suffers from degenerated object appearances in videos, e.g., motion blur, video defocus, rare poses, etc. Existing work attempts to exploit temporal information on box level, but such methods ar...

Full description

Saved in:
Bibliographic Details
Published inProceedings / IEEE International Conference on Computer Vision pp. 408 - 417
Main Authors Xizhou Zhu, Yujie Wang, Jifeng Dai, Lu Yuan, Yichen Wei
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.10.2017
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Extending state-of-the-art object detectors from image to video is challenging. The accuracy of detection suffers from degenerated object appearances in videos, e.g., motion blur, video defocus, rare poses, etc. Existing work attempts to exploit temporal information on box level, but such methods are not trained end-to-end. We present flow-guided feature aggregation, an accurate and end-to-end learning framework for video object detection. It leverages temporal coherence on feature level instead. It improves the per-frame features by aggregation of nearby features along the motion paths, and thus improves the video recognition accuracy. Our method significantly improves upon strong singleframe baselines in ImageNet VID [33], especially for more challenging fast moving objects. Our framework is principled, and on par with the best engineered systems winning the ImageNet VID challenges 2016, without additional bells-and-whistles. The code would be released.
AbstractList Extending state-of-the-art object detectors from image to video is challenging. The accuracy of detection suffers from degenerated object appearances in videos, e.g., motion blur, video defocus, rare poses, etc. Existing work attempts to exploit temporal information on box level, but such methods are not trained end-to-end. We present flow-guided feature aggregation, an accurate and end-to-end learning framework for video object detection. It leverages temporal coherence on feature level instead. It improves the per-frame features by aggregation of nearby features along the motion paths, and thus improves the video recognition accuracy. Our method significantly improves upon strong singleframe baselines in ImageNet VID [33], especially for more challenging fast moving objects. Our framework is principled, and on par with the best engineered systems winning the ImageNet VID challenges 2016, without additional bells-and-whistles. The code would be released.
Author Jifeng Dai
Yichen Wei
Lu Yuan
Yujie Wang
Xizhou Zhu
Author_xml – sequence: 1
  surname: Xizhou Zhu
  fullname: Xizhou Zhu
  email: ezra0408@mail.ustc.edu.cn
  organization: Univ. of Sci. & Technol. of China, Hefei, China
– sequence: 2
  surname: Yujie Wang
  fullname: Yujie Wang
  email: v-yujiwa@microsoft.com
– sequence: 3
  surname: Jifeng Dai
  fullname: Jifeng Dai
  email: jifdai@microsoft.com
– sequence: 4
  surname: Lu Yuan
  fullname: Lu Yuan
  email: luyuan@microsoft.com
– sequence: 5
  surname: Yichen Wei
  fullname: Yichen Wei
  email: yichenw@microsoft.com
BookMark eNotjM1Kw0AURkdRsK3duXMzL5A4c-9kftyVaGqh0I12W2Ymd0JKTSRNEd_egK7O4TvwzdlN13fE2IMUuZTCPW3Kcp-DkCYv4IotnbGyQKulQHDXbAZoRWYKoe7Y_Hw-CoEOrJ6x5-rUf2frS1tTzSvy42UgvmqagRo_tn3HUz_w_VR7vgtHiiN_oXHClO7ZbfKnMy3_uWAf1et7-ZZtd-tNudpmLSg5ZsFoPwkh6NorAK-VMd5YFalQ2kvjbDQxBFckF-Q0a0wQU51U0jFExAV7_PttiejwNbSffvg5WECDUuEvL2ZHwg
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICCV.2017.52
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL) - NZ
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9781538610329
1538610329
EISSN 2380-7504
EndPage 417
ExternalDocumentID 8237314
Genre orig-research
GroupedDBID 29O
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i241t-b76a241e326da422a6477a784ce546a1798c7cbb95f9b184c63f2cfdf4f6cbc33
IEDL.DBID RIE
IngestDate Wed Aug 27 02:41:54 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i241t-b76a241e326da422a6477a784ce546a1798c7cbb95f9b184c63f2cfdf4f6cbc33
PageCount 10
ParticipantIDs ieee_primary_8237314
PublicationCentury 2000
PublicationDate 2017-10
PublicationDateYYYYMMDD 2017-10-01
PublicationDate_xml – month: 10
  year: 2017
  text: 2017-10
PublicationDecade 2010
PublicationTitle Proceedings / IEEE International Conference on Computer Vision
PublicationTitleAbbrev ICCV
PublicationYear 2017
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0039286
Score 2.5810385
Snippet Extending state-of-the-art object detectors from image to video is challenging. The accuracy of detection suffers from degenerated object appearances in...
SourceID ieee
SourceType Publisher
StartPage 408
SubjectTerms Detectors
Feature extraction
Object detection
Optical imaging
Target tracking
Training
Title Flow-Guided Feature Aggregation for Video Object Detection
URI https://ieeexplore.ieee.org/document/8237314
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV09T8MwELVKJ6YCLeJbHhhxWhzHSdhQoRSkAgOtulX-uFQVVYJKIiR-Pb4kLQgxsFmOIls-Offu8t4dIeeoy1NCKKbtZcSEUoIhLGdCaGmUkZoDpgZGj3I4Fg_TYNogFxstDACU5DPwcFj-y7eZKTBV1sXCKj52rd5ygVul1Vp_dZ2bj-SG2B537_v9CRK3Qg8lRT8ap5R-Y9Aio_WKFV3k1Sty7ZnPX8UY_7ulHdL5VujR543v2SUNSPdIq4aUtL6w721yNVhmH-yuWFg3jXCvWAG9nrsge16ahDrMSifuaUafNKZk6A3kJTsr7ZDx4PalP2R1uwS2cG44ZzqUyg3AATKrBOcKNaYqjISBQEiFlclMaLSOgyTWLrAz0k-4SWwiEmm08f190kyzFA4IjYBD2FMQOgMK2zPuTa5loLh1cDCK1SFp41HM3qqKGLP6FI7-nj4m22iJigJ3Qpr5qoBT58pzfVba8Avk2J7w
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFH4heNCTP8D42x482oGl6zZvBkVQQA9AuJG26wjRbAa3mPjX27cNNMaDt6bLsqUv3ff17fveA7hAX57kXFIVXvmUS8kp0nLKuRJaaqGYwdTAYCi6Y_4wdacVuFx7YYwxufjMODjM_-WHic4wVdbAwiot7Fq9YXHfZYVba_XdtUDvi7W0PWj02u0JSrc8B01FP1qn5MjR2YbB6pmFYOTFyVLl6M9f5Rj_-1I7UP_26JHnNfrsQsXEe7BdkkpSbtn3Glx3XpMPep8tQjuNhC9bGnIzt8fseR4UYlkrmdirCXlSmJQhtybN9VlxHcadu1G7S8uGCXRhgTilyhPSDoylZKHkjEl0mUrP59q4XEisTaY9rVTgRoGyRzstWhHTURjxSGilW619qMZJbA6A-IYZrymNZ0PIw6a2dzIlXMlCSwj9QB5CDZdi9lbUxJiVq3D09_Q5bHZHg_6s3xs-HsMWRqUQxJ1ANV1m5tQCe6rO8nh-AWNuojo
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+IEEE+International+Conference+on+Computer+Vision&rft.atitle=Flow-Guided+Feature+Aggregation+for+Video+Object+Detection&rft.au=Xizhou+Zhu&rft.au=Yujie+Wang&rft.au=Jifeng+Dai&rft.au=Lu+Yuan&rft.date=2017-10-01&rft.pub=IEEE&rft.eissn=2380-7504&rft.spage=408&rft.epage=417&rft_id=info:doi/10.1109%2FICCV.2017.52&rft.externalDocID=8237314