Long-term recurrent convolutional networks for visual recognition and description

Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architec...

Full description

Saved in:
Bibliographic Details
Published in2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 2625 - 2634
Main Authors Donahue, Jeff, Hendricks, Lisa Anne, Guadarrama, Sergio, Rohrbach, Marcus, Venugopalan, Subhashini, Darrell, Trevor, Saenko, Kate
Format Conference Proceeding Journal Article
LanguageEnglish
Published IEEE 01.06.2015
Subjects
Online AccessGet full text
ISSN1063-6919
1063-6919
DOI10.1109/CVPR.2015.7298878

Cover

Loading…
Abstract Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are "doubly deep" in that they can be compositional in spatial and temporal "layers". Such models may have advantages when target concepts are complex and/or training data are limited. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Long-term RNN models are appealing in that they directly can map variable-length inputs (e.g., video frames) to variable length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations. Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.
AbstractList Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or "temporally deep", are effective for tasks involving sequences, visual and otherwise. We develop a novel recurrent convolutional architecture suitable for large-scale visual learning which is end-to-end trainable, and demonstrate the value of these models on benchmark video recognition tasks, image description and retrieval problems, and video narration challenges. In contrast to current models which assume a fixed spatio-temporal receptive field or simple temporal averaging for sequential processing, recurrent convolutional models are "doubly deep" in that they can be compositional in spatial and temporal "layers". Such models may have advantages when target concepts are complex and/or training data are limited. Learning long-term dependencies is possible when nonlinearities are incorporated into the network state updates. Long-term RNN models are appealing in that they directly can map variable-length inputs (e.g., video frames) to variable length outputs (e.g., natural language text) and can model complex temporal dynamics; yet they can be optimized with backpropagation. Our recurrent long-term models are directly connected to modern visual convnet models and can be jointly trained to simultaneously learn temporal dynamics and convolutional perceptual representations. Our results show such models have distinct advantages over state-of-the-art models for recognition or generation which are separately defined and/or optimized.
Author Venugopalan, Subhashini
Saenko, Kate
Hendricks, Lisa Anne
Rohrbach, Marcus
Guadarrama, Sergio
Donahue, Jeff
Darrell, Trevor
Author_xml – sequence: 1
  givenname: Jeff
  surname: Donahue
  fullname: Donahue, Jeff
  email: jdonahue@eecs.berkeley.edu
  organization: UC Berkeley, Berkeley, CA, USA
– sequence: 2
  givenname: Lisa Anne
  surname: Hendricks
  fullname: Hendricks, Lisa Anne
  email: lisa-anne@eecs.berkeley.edu
  organization: UC Berkeley, Berkeley, CA, USA
– sequence: 3
  givenname: Sergio
  surname: Guadarrama
  fullname: Guadarrama, Sergio
  email: sguada@eecs.berkeley.edu
  organization: UC Berkeley, Berkeley, CA, USA
– sequence: 4
  givenname: Marcus
  surname: Rohrbach
  fullname: Rohrbach, Marcus
  email: rohrbach@eecs.berkeley.edu
  organization: UC Berkeley, Berkeley, CA, USA
– sequence: 5
  givenname: Subhashini
  surname: Venugopalan
  fullname: Venugopalan, Subhashini
  email: vsub@cs.utexas.edu
  organization: UT Austin, Austin, TX, USA
– sequence: 6
  givenname: Trevor
  surname: Darrell
  fullname: Darrell, Trevor
  email: trevor@eecs.berkeley.edu
  organization: UC Berkeley, Berkeley, CA, USA
– sequence: 7
  givenname: Kate
  surname: Saenko
  fullname: Saenko, Kate
  email: saenko@cs.uml.edu
  organization: UMass Lowell, Lowell, MA, USA
BookMark eNpNkMtOwzAQRQ0qEm3pByA2XrJJ8TiJH0tU8ZIq8RCwjRzHrixSO9hJEX9PqnbBau65OhppZoYmPniD0CWQJQCRN6vPl7clJVAuOZVCcHGCZlAwnjPJCnKKpkBYnjEJcvIvn6NFSq4mOSFCSkqm6HUd_CbrTdziaPQQo_E91sHvQjv0LnjVYm_6nxC_ErYh4p1Lw9iNbth4tzew8g1uTNLRdXu-QGdWtcksjnOOPu7v3leP2fr54Wl1u84c5UWfFZYwC7YuGyYVaFZSqRUI2YgRLOG6KLTgVpemVlIKxq2CWquGUl6SXJT5HF0f9nYxfA8m9dXWJW3aVnkThlQB5-OdAiiM6tVBdcaYqotuq-JvdXxc_gcbCmMh
ContentType Conference Proceeding
Journal Article
DBID 6IE
6IH
CBEJK
RIE
RIO
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/CVPR.2015.7298878
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore Digital Library
IEEE Proceedings Order Plans (POP) 1998-present
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Computer and Information Systems Abstracts
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
Computer Science
EISBN 1467369640
9781467369640
EISSN 1063-6919
EndPage 2634
ExternalDocumentID 7298878
Genre orig-research
GroupedDBID 23M
29F
29O
6IE
6IH
6IK
ABDPE
ACGFS
ALMA_UNASSIGNED_HOLDINGS
CBEJK
IPLJI
M43
RIE
RIO
RNS
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-i274t-4f06f1fb5d69a1c6529ca189d81c6f07c44c87fc5eba99867fa1bcad227503853
IEDL.DBID RIE
ISSN 1063-6919
IngestDate Sun Aug 24 03:17:13 EDT 2025
Wed Aug 27 02:49:18 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i274t-4f06f1fb5d69a1c6529ca189d81c6f07c44c87fc5eba99867fa1bcad227503853
Notes ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Conference-1
ObjectType-Feature-3
content type line 23
SourceType-Conference Papers & Proceedings-2
PQID 1770308121
PQPubID 23500
PageCount 10
ParticipantIDs ieee_primary_7298878
proquest_miscellaneous_1770308121
PublicationCentury 2000
PublicationDate 20150601
PublicationDateYYYYMMDD 2015-06-01
PublicationDate_xml – month: 06
  year: 2015
  text: 20150601
  day: 01
PublicationDecade 2010
PublicationTitle 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
PublicationTitleAbbrev CVPR
PublicationYear 2015
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssib030089920
ssj0023720
ssj0003211698
Score 2.5694408
Snippet Models based on deep convolutional networks have dominated recent image interpretation tasks; we investigate whether models which are also recurrent, or...
SourceID proquest
ieee
SourceType Aggregation Database
Publisher
StartPage 2625
SubjectTerms Computational modeling
Computer architecture
Data models
Image recognition
Logic gates
Mathematical models
Microprocessors
Networks
Pattern recognition
Recognition
Tasks
Temporal logic
Visual
Visualization
Title Long-term recurrent convolutional networks for visual recognition and description
URI https://ieeexplore.ieee.org/document/7298878
https://www.proquest.com/docview/1770308121
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LS8QwEA7rnjz5WvFNBI-29pk258VFxJVVXPFW8hRRWtndevDXO9OmFdSDtzZQ2maSmW8y38wQchYhCFEx8xJrhZfkqfQA1EsvMFZEOgCLH2Pu8PSWXc2T66f0aUDO-1wYY0xDPjM-XjaxfF2pGo_KLgAIwp7I18gaOG5trla3duIA41cO-qAWjsGzYbyPKETYjaWJfLLYYzzkLsIZBvxi_Di7R5JX6rsXuE4rv9RzY3MmG2TafW1LNXn165X01eePQo7__Z1NMvrO7qOz3m5tkYEpt8mGg6PUbfYlDHUdH7qxHXJ3U5XPHmpzusCDeiztRJG47haweKNlyytfUkDD9ONlWcNYz1KqSipKTbXpldWIzCeXD-MrzzVl8F7AgV2BTANmQytTzbgIFUsjrkSYc53DjQ0ylSQqz6xKjRQgdZZZEUoldISF5GMAB7tkWFal2SM0ZlomAElUFqeJ5CJXmjHLuFQRSMmofbKDk1a8t3U3Cjdf--S0E0sBewEDHKI0Vb0swgz1F0CW8ODvRw_JOsq5pXodkeFqUZtjABUredKspi-ioclN
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9xADLZ4HOAE5SFebQeJI1nynGTOqGjb7iJAgLhF80QIlCB2w6G_vnYyCRJw4JZYipKMZ-xvxp9tgKOYQIhOeJA6J4O0yFSAoF4FoXUyNiF6_IRyh6fnfHyT_rnL7hbgeMiFsda25DM7oss2lm9q3dBR2QkCQVwTxSIsZ5SM22Vr9bMnCSmC5cEP2eEE9zZcDDGFmPqxtLFPngRcRMLHOKNQnJzeXlwRzSsb-Vf4XisfDHTrdc7WYNp_b0c2eRw1czXS_96VcvzqD63D1lt-H7sYPNc3WLDVBqx5QMr8cp-hqO_50Ms24XJSV_cB2XP2Qkf1VNyJEXXdT2H5xKqOWT5jiIfZ68OsQdnAU6orJivDjB3M1RbcnP26Ph0Hvi1D8IBb2DlqNeQuciozXMhI8ywWWkaFMAXeuDDXaaqL3OnMKol657mTkdLSxFRKPkF4sA1LVV3ZHWAJNypFUKLzJEuVkIU2nDsulI5RS1bvwiYNWvncVd4o_XjtwmGvlhJXA4U4ZGXrZlZGOVkwBC3R3ueP_oSV8fV0Uk5-n__dh1XSeUf8OoCl-UtjvyPEmKsf7cz6D0xvzJU
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2015+IEEE+Conference+on+Computer+Vision+and+Pattern+Recognition+%28CVPR%29&rft.atitle=Long-term+recurrent+convolutional+networks+for+visual+recognition+and+description&rft.au=Donahue%2C+Jeff&rft.au=Hendricks%2C+Lisa+Anne&rft.au=Guadarrama%2C+Sergio&rft.au=Rohrbach%2C+Marcus&rft.date=2015-06-01&rft.pub=IEEE&rft.issn=1063-6919&rft.eissn=1063-6919&rft.spage=2625&rft.epage=2634&rft_id=info:doi/10.1109%2FCVPR.2015.7298878&rft.externalDocID=7298878
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-6919&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-6919&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-6919&client=summon