Enhancing Video Summarization via Vision-Language Embedding

This paper addresses video summarization, or the problem of distilling a raw video into a shorter form while still capturing the original story. We show that visual representations supervised by freeform language make a good fit for this application by extending a recent submodular summarization app...

Full description

Saved in:
Bibliographic Details
Published in2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 1052 - 1060
Main Authors Plummer, Bryan A., Brown, Matthew, Lazebnik, Svetlana
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.07.2017
Subjects
Online AccessGet full text

Cover

Loading…
Abstract This paper addresses video summarization, or the problem of distilling a raw video into a shorter form while still capturing the original story. We show that visual representations supervised by freeform language make a good fit for this application by extending a recent submodular summarization approach [9] with representativeness and interestingness objectives computed on features from a joint vision-language embedding space. We perform an evaluation on two diverse datasets, UT Egocentric [18] and TV Episodes [45], and show that our new objectives give improved summarization ability compared to standard visual features alone. Our experiments also show that the vision-language embedding need not be trained on domainspecific data, but can be learned from standard still image vision-language datasets and transferred to video. A further benefit of our model is the ability to guide a summary using freeform text input at test time, allowing user customization.
AbstractList This paper addresses video summarization, or the problem of distilling a raw video into a shorter form while still capturing the original story. We show that visual representations supervised by freeform language make a good fit for this application by extending a recent submodular summarization approach [9] with representativeness and interestingness objectives computed on features from a joint vision-language embedding space. We perform an evaluation on two diverse datasets, UT Egocentric [18] and TV Episodes [45], and show that our new objectives give improved summarization ability compared to standard visual features alone. Our experiments also show that the vision-language embedding need not be trained on domainspecific data, but can be learned from standard still image vision-language datasets and transferred to video. A further benefit of our model is the ability to guide a summary using freeform text input at test time, allowing user customization.
Author Lazebnik, Svetlana
Brown, Matthew
Plummer, Bryan A.
Author_xml – sequence: 1
  givenname: Bryan A.
  surname: Plummer
  fullname: Plummer, Bryan A.
  email: bplumme2@illinois.edu
  organization: Univ. of Illinois at Urbana Champaign, Champaign, IL, USA
– sequence: 2
  givenname: Matthew
  surname: Brown
  fullname: Brown, Matthew
  email: mtbr@google.com
– sequence: 3
  givenname: Svetlana
  surname: Lazebnik
  fullname: Lazebnik, Svetlana
  email: slazebni@illinois.edu
  organization: Univ. of Illinois at Urbana Champaign, Champaign, IL, USA
BookMark eNotjstKxEAURFsYwcno0pWb_EDGezv9xJWE-ICA4mO2w013J7aYjkxmBP16A7qqqgNVVMYWaUyBsXOENSLYy2rz-LTmgHqO5ohlKEujQEgtFmyJoMpCWbQnLJumdwBeag5LdlWnN0oupj7fRB_G_PkwDLSLP7SPY8q_Is18mm3RUOoP1Ie8Htrg_dw4ZccdfUzh7F9X7PWmfqnuiubh9r66bgrH0e4L7MB3ZWeFQ9mh19obB9qQtK3gSmkfBHhUllTA1gEZEuhaLlojSHoryxW7-NuNIYTt5y7OB7-3BqxVgOUvhTxIBg
CODEN IEEPAD
CitedBy_id crossref_primary_10_3390_app14114400
crossref_primary_10_3390_electronics12071735
crossref_primary_10_1007_s00530_022_01040_3
crossref_primary_10_1109_TCSVT_2022_3225549
crossref_primary_10_1007_s11042_019_08175_y
crossref_primary_10_35377_saucis___1139765
crossref_primary_10_1109_TMM_2019_2930041
crossref_primary_10_1109_TIE_2019_2931283
crossref_primary_10_1016_j_eswa_2019_04_065
crossref_primary_10_1109_TCSVT_2019_2898899
crossref_primary_10_1049_iet_ipr_2020_0234
crossref_primary_10_1109_JPROC_2021_3117472
crossref_primary_10_3390_app11167266
crossref_primary_10_1109_TITS_2019_2929618
crossref_primary_10_1109_TIP_2020_2985868
crossref_primary_10_1109_TPAMI_2022_3157198
crossref_primary_10_1007_s10462_021_10104_1
crossref_primary_10_1145_3445794
crossref_primary_10_1109_TMM_2023_3266615
crossref_primary_10_1177_02783649211069154
crossref_primary_10_1109_TMM_2020_2987683
crossref_primary_10_1007_s11042_023_14925_w
crossref_primary_10_1109_TPAMI_2020_2983929
crossref_primary_10_1109_TMM_2019_2960594
crossref_primary_10_1016_j_procs_2024_04_142
crossref_primary_10_1145_3495211
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/CVPR.2017.118
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library Online
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library Online
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
Computer Science
EISBN 1538604574
9781538604571
EndPage 1060
ExternalDocumentID 8099601
Genre orig-research
GroupedDBID 23M
29F
29O
6IE
6IH
6IK
ACGFS
ALMA_UNASSIGNED_HOLDINGS
CBEJK
G8K
IPLJI
JC5
M43
RIE
RIG
RIO
RNS
ID FETCH-LOGICAL-c219t-1f0df3f94c15f1d77d8c078a59b42667de40d169a6e1bc0a8a41cb24b84a5d953
IEDL.DBID RIE
ISSN 1063-6919
IngestDate Wed Jun 26 19:27:38 EDT 2024
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c219t-1f0df3f94c15f1d77d8c078a59b42667de40d169a6e1bc0a8a41cb24b84a5d953
PageCount 9
ParticipantIDs ieee_primary_8099601
PublicationCentury 2000
PublicationDate 2017-July
PublicationDateYYYYMMDD 2017-07-01
PublicationDate_xml – month: 07
  year: 2017
  text: 2017-July
PublicationDecade 2010
PublicationTitle 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
PublicationTitleAbbrev CVPR
PublicationYear 2017
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0023720
Score 2.4373097
Snippet This paper addresses video summarization, or the problem of distilling a raw video into a shorter form while still capturing the original story. We show that...
SourceID ieee
SourceType Publisher
StartPage 1052
SubjectTerms Cameras
Google
Image segmentation
Optimization
Semantics
Visualization
Title Enhancing Video Summarization via Vision-Language Embedding
URI https://ieeexplore.ieee.org/document/8099601
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG6QkydUMP7ODh4ttGzr9uKRQIhRQ1QIN9L2tUqMg-jw4F9vuw2IxoO39l26tOnej37f9wi5VAjdWAtGY5sal6CEXQrKItXCYuL14qwq0Bb3YjiObqbxtEauNlwYY0wBPjNtPyze8nGhV75U1kmZ1xJxuc5OAlBytTbJle-2UrxsipAK4LDV0-z0JqMHD-JK3DT90UWlcCKDBrlbL19iR17bq1y19dcvZcb_ft8eaW3pesFo44j2Sc1kB6RRxZdBdXs_nGndwmFta5LrfvbiFTey52AyR7MIHgsyW0XODD7n0tl9QY3eVoXNoP-mDPqFWmQ86D_1hrTqp0C1-y_llFuGNrQQaR5bjkmCqXYRgoxBeT-doIkYcgFSGK40k6mMuFbdSKWRjBHi8JDUs0VmjkggpLvrmCAH6UISBFBoXeqhQ4bAUoRj0vTbM1uWkhmzamdO_jafkl1_PCUK9ozU8_eVOXe-PlcXxSF_A0FJqEg
link.rule.ids 310,311,783,787,792,793,799,23942,23943,25152,27937,55086
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4IHvSECsbf7uDRwsq2ro1HAkEFQhQIN9L2tUKMw-jw4F9vuw2IxoO39l3WtOnej37f9xC6lsCbkaI-jgzTNkEJmphLA1hRA7HTizMyQ1sMaHcc3k-jaQndbLgwWusMfKbrbpi95cNSrVyprMF8pyVic50dG1czmrO1NumV67eSvW3SAFNO-FZRs9GaDB8djCu2U_ajj0rmRjoV1F8vIEePvNRXqayrr1_ajP9d4T6qbQl73nDjig5QSSeHqFJEmF5xfz-sad3EYW2rott2MneaG8mzN1mAXnpPGZ2toGd6nwth7a6khntFadNrv0oN7kM1NO60R60uLjoqYGX_TCkmxgcTGB4qEhkCcQxM2RhBRFw6Tx2DDn0glAuqiVS-YCIkSjZDyUIRAY-CI1ROlok-Rh4V9rZDDIQLG5QA5xKMTT5U4AP3GfATVHXbM3vLRTNmxc6c_m2-QrvdUb83690NHs7QnjuqHBN7jsrp-0pfWM-fysvswL8BPrirkw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2017+IEEE+Conference+on+Computer+Vision+and+Pattern+Recognition+%28CVPR%29&rft.atitle=Enhancing+Video+Summarization+via+Vision-Language+Embedding&rft.au=Plummer%2C+Bryan+A.&rft.au=Brown%2C+Matthew&rft.au=Lazebnik%2C+Svetlana&rft.date=2017-07-01&rft.pub=IEEE&rft.issn=1063-6919&rft.spage=1052&rft.epage=1060&rft_id=info:doi/10.1109%2FCVPR.2017.118&rft.externalDocID=8099601
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-6919&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-6919&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-6919&client=summon