Enhancing Video Summarization via Vision-Language Embedding
This paper addresses video summarization, or the problem of distilling a raw video into a shorter form while still capturing the original story. We show that visual representations supervised by freeform language make a good fit for this application by extending a recent submodular summarization app...
Saved in:
Published in | 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) pp. 1052 - 1060 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.07.2017
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | This paper addresses video summarization, or the problem of distilling a raw video into a shorter form while still capturing the original story. We show that visual representations supervised by freeform language make a good fit for this application by extending a recent submodular summarization approach [9] with representativeness and interestingness objectives computed on features from a joint vision-language embedding space. We perform an evaluation on two diverse datasets, UT Egocentric [18] and TV Episodes [45], and show that our new objectives give improved summarization ability compared to standard visual features alone. Our experiments also show that the vision-language embedding need not be trained on domainspecific data, but can be learned from standard still image vision-language datasets and transferred to video. A further benefit of our model is the ability to guide a summary using freeform text input at test time, allowing user customization. |
---|---|
AbstractList | This paper addresses video summarization, or the problem of distilling a raw video into a shorter form while still capturing the original story. We show that visual representations supervised by freeform language make a good fit for this application by extending a recent submodular summarization approach [9] with representativeness and interestingness objectives computed on features from a joint vision-language embedding space. We perform an evaluation on two diverse datasets, UT Egocentric [18] and TV Episodes [45], and show that our new objectives give improved summarization ability compared to standard visual features alone. Our experiments also show that the vision-language embedding need not be trained on domainspecific data, but can be learned from standard still image vision-language datasets and transferred to video. A further benefit of our model is the ability to guide a summary using freeform text input at test time, allowing user customization. |
Author | Lazebnik, Svetlana Brown, Matthew Plummer, Bryan A. |
Author_xml | – sequence: 1 givenname: Bryan A. surname: Plummer fullname: Plummer, Bryan A. email: bplumme2@illinois.edu organization: Univ. of Illinois at Urbana Champaign, Champaign, IL, USA – sequence: 2 givenname: Matthew surname: Brown fullname: Brown, Matthew email: mtbr@google.com – sequence: 3 givenname: Svetlana surname: Lazebnik fullname: Lazebnik, Svetlana email: slazebni@illinois.edu organization: Univ. of Illinois at Urbana Champaign, Champaign, IL, USA |
BookMark | eNotjstKxEAURFsYwcno0pWb_EDGezv9xJWE-ICA4mO2w013J7aYjkxmBP16A7qqqgNVVMYWaUyBsXOENSLYy2rz-LTmgHqO5ohlKEujQEgtFmyJoMpCWbQnLJumdwBeag5LdlWnN0oupj7fRB_G_PkwDLSLP7SPY8q_Is18mm3RUOoP1Ie8Htrg_dw4ZccdfUzh7F9X7PWmfqnuiubh9r66bgrH0e4L7MB3ZWeFQ9mh19obB9qQtK3gSmkfBHhUllTA1gEZEuhaLlojSHoryxW7-NuNIYTt5y7OB7-3BqxVgOUvhTxIBg |
CODEN | IEEPAD |
CitedBy_id | crossref_primary_10_3390_app14114400 crossref_primary_10_3390_electronics12071735 crossref_primary_10_1007_s00530_022_01040_3 crossref_primary_10_1109_TCSVT_2022_3225549 crossref_primary_10_1007_s11042_019_08175_y crossref_primary_10_35377_saucis___1139765 crossref_primary_10_1109_TMM_2019_2930041 crossref_primary_10_1109_TIE_2019_2931283 crossref_primary_10_1016_j_eswa_2019_04_065 crossref_primary_10_1109_TCSVT_2019_2898899 crossref_primary_10_1049_iet_ipr_2020_0234 crossref_primary_10_1109_JPROC_2021_3117472 crossref_primary_10_3390_app11167266 crossref_primary_10_1109_TITS_2019_2929618 crossref_primary_10_1109_TIP_2020_2985868 crossref_primary_10_1109_TPAMI_2022_3157198 crossref_primary_10_1007_s10462_021_10104_1 crossref_primary_10_1145_3445794 crossref_primary_10_1109_TMM_2023_3266615 crossref_primary_10_1177_02783649211069154 crossref_primary_10_1109_TMM_2020_2987683 crossref_primary_10_1007_s11042_023_14925_w crossref_primary_10_1109_TPAMI_2020_2983929 crossref_primary_10_1109_TMM_2019_2960594 crossref_primary_10_1016_j_procs_2024_04_142 crossref_primary_10_1145_3495211 |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/CVPR.2017.118 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Electronic Library Online IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library Online url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Applied Sciences Computer Science |
EISBN | 1538604574 9781538604571 |
EndPage | 1060 |
ExternalDocumentID | 8099601 |
Genre | orig-research |
GroupedDBID | 23M 29F 29O 6IE 6IH 6IK ACGFS ALMA_UNASSIGNED_HOLDINGS CBEJK G8K IPLJI JC5 M43 RIE RIG RIO RNS |
ID | FETCH-LOGICAL-c219t-1f0df3f94c15f1d77d8c078a59b42667de40d169a6e1bc0a8a41cb24b84a5d953 |
IEDL.DBID | RIE |
ISSN | 1063-6919 |
IngestDate | Wed Jun 26 19:27:38 EDT 2024 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c219t-1f0df3f94c15f1d77d8c078a59b42667de40d169a6e1bc0a8a41cb24b84a5d953 |
PageCount | 9 |
ParticipantIDs | ieee_primary_8099601 |
PublicationCentury | 2000 |
PublicationDate | 2017-July |
PublicationDateYYYYMMDD | 2017-07-01 |
PublicationDate_xml | – month: 07 year: 2017 text: 2017-July |
PublicationDecade | 2010 |
PublicationTitle | 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) |
PublicationTitleAbbrev | CVPR |
PublicationYear | 2017 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0023720 |
Score | 2.4373097 |
Snippet | This paper addresses video summarization, or the problem of distilling a raw video into a shorter form while still capturing the original story. We show that... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 1052 |
SubjectTerms | Cameras Image segmentation Optimization Semantics Visualization |
Title | Enhancing Video Summarization via Vision-Language Embedding |
URI | https://ieeexplore.ieee.org/document/8099601 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG6QkydUMP7ODh4ttGzr9uKRQIhRQ1QIN9L2tUqMg-jw4F9vuw2IxoO39l26tOnej37f9wi5VAjdWAtGY5sal6CEXQrKItXCYuL14qwq0Bb3YjiObqbxtEauNlwYY0wBPjNtPyze8nGhV75U1kmZ1xJxuc5OAlBytTbJle-2UrxsipAK4LDV0-z0JqMHD-JK3DT90UWlcCKDBrlbL19iR17bq1y19dcvZcb_ft8eaW3pesFo44j2Sc1kB6RRxZdBdXs_nGndwmFta5LrfvbiFTey52AyR7MIHgsyW0XODD7n0tl9QY3eVoXNoP-mDPqFWmQ86D_1hrTqp0C1-y_llFuGNrQQaR5bjkmCqXYRgoxBeT-doIkYcgFSGK40k6mMuFbdSKWRjBHi8JDUs0VmjkggpLvrmCAH6UISBFBoXeqhQ4bAUoRj0vTbM1uWkhmzamdO_jafkl1_PCUK9ozU8_eVOXe-PlcXxSF_A0FJqEg |
link.rule.ids | 310,311,783,787,792,793,799,23942,23943,25152,27937,55086 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4IHvSECsbf7uDRwsq2ro1HAkEFQhQIN9L2tUKMw-jw4F9vuw2IxoO39l3WtOnej37f9xC6lsCbkaI-jgzTNkEJmphLA1hRA7HTizMyQ1sMaHcc3k-jaQndbLgwWusMfKbrbpi95cNSrVyprMF8pyVic50dG1czmrO1NumV67eSvW3SAFNO-FZRs9GaDB8djCu2U_ajj0rmRjoV1F8vIEePvNRXqayrr1_ajP9d4T6qbQl73nDjig5QSSeHqFJEmF5xfz-sad3EYW2rott2MneaG8mzN1mAXnpPGZ2toGd6nwth7a6khntFadNrv0oN7kM1NO60R60uLjoqYGX_TCkmxgcTGB4qEhkCcQxM2RhBRFw6Tx2DDn0glAuqiVS-YCIkSjZDyUIRAY-CI1ROlok-Rh4V9rZDDIQLG5QA5xKMTT5U4AP3GfATVHXbM3vLRTNmxc6c_m2-QrvdUb83690NHs7QnjuqHBN7jsrp-0pfWM-fysvswL8BPrirkw |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=2017+IEEE+Conference+on+Computer+Vision+and+Pattern+Recognition+%28CVPR%29&rft.atitle=Enhancing+Video+Summarization+via+Vision-Language+Embedding&rft.au=Plummer%2C+Bryan+A.&rft.au=Brown%2C+Matthew&rft.au=Lazebnik%2C+Svetlana&rft.date=2017-07-01&rft.pub=IEEE&rft.issn=1063-6919&rft.spage=1052&rft.epage=1060&rft_id=info:doi/10.1109%2FCVPR.2017.118&rft.externalDocID=8099601 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1063-6919&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1063-6919&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1063-6919&client=summon |