SummEval: Re-evaluating Summarization Evaluation

The scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols continue to inhibit progress. We address the existing shortcomings of summarization evaluation methods along five dimensions: 1) we re-evaluate 14 au...

Full description

Saved in:
Bibliographic Details
Published inTransactions of the Association for Computational Linguistics Vol. 9; pp. 391 - 409
Main Authors Fabbri, Alexander R., Kryściński, Wojciech, McCann, Bryan, Xiong, Caiming, Socher, Richard, Radev, Dragomir
Format Journal Article
LanguageEnglish
Published One Rogers Street, Cambridge, MA 02142-1209, USA MIT Press 01.01.2021
MIT Press Journals, The
The MIT Press
Subjects
Online AccessGet full text

Cover

Loading…
Abstract The scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols continue to inhibit progress. We address the existing shortcomings of summarization evaluation methods along five dimensions: 1) we re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion using neural summarization model outputs along with expert and crowd-sourced human annotations; 2) we consistently benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics; 3) we assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset and share it in a unified format; 4) we implement and share a toolkit that provides an extensible and unified API for evaluating summarization models across a broad range of automatic metrics; and 5) we assemble and share the largest and most diverse, in terms of model types, collection of human judgments of model-generated summaries on the CNN/Daily Mail dataset annotated by both expert judges and crowd-source workers. We hope that this work will help promote a more complete evaluation protocol for text summarization as well as advance research in developing evaluation metrics that better correlate with human judgments.
AbstractList AbstractThe scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols continue to inhibit progress. We address the existing shortcomings of summarization evaluation methods along five dimensions: 1) we re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion using neural summarization model outputs along with expert and crowd-sourced human annotations; 2) we consistently benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics; 3) we assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset and share it in a unified format; 4) we implement and share a toolkit that provides an extensible and unified API for evaluating summarization models across a broad range of automatic metrics; and 5) we assemble and share the largest and most diverse, in terms of model types, collection of human judgments of model-generated summaries on the CNN/Daily Mail dataset annotated by both expert judges and crowd-source workers. We hope that this work will help promote a more complete evaluation protocol for text summarization as well as advance research in developing evaluation metrics that better correlate with human judgments.
The scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols continue to inhibit progress. We address the existing shortcomings of summarization evaluation methods along five dimensions: 1) we re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion using neural summarization model outputs along with expert and crowd-sourced human annotations; 2) we consistently benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics; 3) we assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset and share it in a unified format; 4) we implement and share a toolkit that provides an extensible and unified API for evaluating summarization models across a broad range of automatic metrics; and 5) we assemble and share the largest and most diverse, in terms of model types, collection of human judgments of model-generated summaries on the CNN/Daily Mail dataset annotated by both expert judges and crowd-source workers. We hope that this work will help promote a more complete evaluation protocol for text summarization as well as advance research in developing evaluation metrics that better correlate with human judgments.
Author Fabbri, Alexander R.
Xiong, Caiming
Kryściński, Wojciech
Radev, Dragomir
McCann, Bryan
Socher, Richard
Author_xml – sequence: 1
  givenname: Alexander R.
  surname: Fabbri
  fullname: Fabbri, Alexander R.
  email: alexander.fabbri@yale.edu
  organization: Yale University, United States. alexander.fabbri@yale.edu
– sequence: 2
  givenname: Wojciech
  surname: Kryściński
  fullname: Kryściński, Wojciech
  organization: Salesforce Research, United States. kryscinski@salesforce.com
– sequence: 3
  givenname: Bryan
  surname: McCann
  fullname: McCann, Bryan
  email: bryan.mccann.is@gmail.com
  organization: Salesforce Research, United States. bryan.mccann.is@gmail.com
– sequence: 4
  givenname: Caiming
  surname: Xiong
  fullname: Xiong, Caiming
  email: cxiong@salesforce.com
  organization: Salesforce Research, United States. cxiong@salesforce.com
– sequence: 5
  givenname: Richard
  surname: Socher
  fullname: Socher, Richard
  email: richard@socher.org
  organization: Salesforce Research, United States. richard@socher.org
– sequence: 6
  givenname: Dragomir
  surname: Radev
  fullname: Radev, Dragomir
  email: dragomir.radev@yale.edu
  organization: Salesforce Research, United States. dragomir.radev@yale.edu
BookMark eNp1kF1LwzAUhoNMcM7d-QMG3nhh9eSjbeqdyNTBQPADvAunbTIyumamraC_3mydMESv8ubkOe95c47JoHa1JuSUwiWlCbtqsagUKgCe8gMyZBzSiMv0bbCnj8i4aZYAQCWVkLAhgedutZp-YHU9edKRDqLD1taLyaaO3n6Fm6sn092Dq0_IocGq0ePdOSKvd9OX24do_ng_u72ZR4UAaCOR6oKmCUNO4xLzGPOcURHnMtFJjDzopORQogBjuBAlplrqrGDCyDxmJucjMut9S4dLtfY2pPlUDq3aFpxfKPStLSqtOMowhEqGUgjQFAtghsWGllmJACx4nfVea-_eO920auk6X4f4ismMSxlnjAaK9VThXdN4bVRh2-2fW4-2UhTUZtFqf9Gh6eJX00_Uf_DzHl_ZvRB_ot95mI50
CitedBy_id crossref_primary_10_3934_aci_2024001
crossref_primary_10_1017_S1351324923000177
crossref_primary_10_1016_j_jbi_2023_104358
crossref_primary_10_1134_S1995080223080115
crossref_primary_10_1162_coli_a_00502
crossref_primary_10_1109_TKDE_2024_3509715
crossref_primary_10_1007_s10506_023_09349_8
crossref_primary_10_1145_3652951
crossref_primary_10_1016_j_engappai_2024_108231
crossref_primary_10_1016_j_neunet_2024_106417
crossref_primary_10_1162_tacl_a_00576
crossref_primary_10_1162_tacl_a_00417
crossref_primary_10_1007_s10115_024_02217_0
crossref_primary_10_2196_68998
crossref_primary_10_1162_tacl_a_00453
crossref_primary_10_1162_tacl_a_00695
crossref_primary_10_3389_frai_2023_1223924
crossref_primary_10_1016_j_eswa_2024_124456
crossref_primary_10_1007_s10462_023_10582_5
crossref_primary_10_1016_j_infsof_2022_106922
crossref_primary_10_1145_3584700
crossref_primary_10_21603_2782_4799_2024_3_3_203_222
crossref_primary_10_1145_3527546_3527561
crossref_primary_10_1016_j_ijmedinf_2024_105443
crossref_primary_10_1016_j_knosys_2024_112570
crossref_primary_10_1145_3703155
crossref_primary_10_1109_TSE_2021_3136169
crossref_primary_10_1016_j_eswa_2023_121364
crossref_primary_10_1186_s40537_024_00950_5
crossref_primary_10_3390_info14060303
crossref_primary_10_1016_S2589_7500_24_00111_0
crossref_primary_10_1016_j_jbi_2023_104533
crossref_primary_10_1080_09544828_2023_2301230
crossref_primary_10_36548_jei_2021_4_006
crossref_primary_10_1016_j_aei_2022_101649
crossref_primary_10_1162_tacl_a_00506
crossref_primary_10_1038_s41746_024_01091_y
crossref_primary_10_1007_s10278_024_00985_3
crossref_primary_10_1162_tacl_a_00632
crossref_primary_10_3390_info14040250
crossref_primary_10_1145_3485766
crossref_primary_10_6339_24_JDS1149
crossref_primary_10_1109_ACCESS_2024_3377463
crossref_primary_10_1016_j_eswa_2025_127234
crossref_primary_10_1007_s13278_024_01323_9
crossref_primary_10_1109_ACCESS_2023_3292300
crossref_primary_10_3389_frai_2024_1375419
crossref_primary_10_1145_3529754
crossref_primary_10_1145_3583558
crossref_primary_10_1016_j_jcmg_2024_05_021
crossref_primary_10_2903_sp_efsa_2023_EN_8223
crossref_primary_10_3389_frai_2024_1200949
crossref_primary_10_3390_informatics10010005
crossref_primary_10_1109_ACCESS_2022_3197769
crossref_primary_10_1051_e3sconf_202561903005
crossref_primary_10_7232_JKIIE_2024_50_2_097
crossref_primary_10_1016_j_jbi_2024_104640
crossref_primary_10_1016_j_knosys_2025_112969
crossref_primary_10_1038_s41746_023_00896_7
crossref_primary_10_1038_s41746_024_01239_w
crossref_primary_10_1145_3597307
crossref_primary_10_1162_tacl_a_00702
crossref_primary_10_1162_tacl_a_00703
crossref_primary_10_3390_app14020713
crossref_primary_10_1142_S0218213024500179
crossref_primary_10_1016_j_nlp_2024_100080
crossref_primary_10_1016_j_eswa_2025_126592
crossref_primary_10_1109_ACCESS_2023_3322226
crossref_primary_10_1162_coli_a_00519
crossref_primary_10_1093_database_baad031
crossref_primary_10_1162_tacl_a_00583
Cites_doi 10.18653/v1/P19-2034
10.18653/v1/D15-1013
10.18653/v1/W15-3049
10.18653/v1/D19-1307
10.18653/v1/P18-1013
10.18653/v1/D18-1440
10.18653/v1/2020.emnlp-main.750
10.18653/v1/D15-1222
10.1007/BF00992696
10.18653/v1/2020.nlposs-1.17
10.18653/v1/P19-1502
10.18653/v1/P18-1060
10.1162/COLI_a_00123
10.18653/v1/D18-1089
10.18653/v1/P19-1264
10.18653/v1/D18-1207
10.18653/v1/D18-1208
10.18653/v1/D18-1443
10.18653/v1/P17-1099
10.18653/v1/P18-1064
10.18653/v1/W17-4510
10.18653/v1/P19-1330
10.3115/1626355.1626389
10.18653/v1/P19-1499
10.18653/v1/D19-1051
10.18653/v1/D18-1088
10.18653/v1/D18-1409
10.18653/v1/2020.acl-main.450
10.18653/v1/D19-1320
10.18653/v1/D19-1323
10.18653/v1/2020.acl-main.454
10.18653/v1/D18-1085
10.18653/v1/P18-1063
ContentType Journal Article
Copyright 2021. This work is published under https://creativecommons.org/licenses/by/4.0/legalcode (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Copyright_xml – notice: 2021. This work is published under https://creativecommons.org/licenses/by/4.0/legalcode (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
DBID AAYXX
CITATION
7T9
8FE
8FG
ABUWG
AFKRA
ALSLI
ARAPS
AZQEC
BENPR
BGLVJ
CCPQU
CPGLG
CRLPW
DWQXO
GNUQQ
HCIFZ
JQ2
K7-
P5Z
P62
PHGZM
PHGZT
PIMPY
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PRQQA
DOA
DOI 10.1162/tacl_a_00373
DatabaseName CrossRef
Linguistics and Language Behavior Abstracts (LLBA)
ProQuest SciTech Collection
ProQuest Technology Collection
ProQuest Central (Alumni)
ProQuest Central UK/Ireland
Social Science Premium Collection
Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Central
ProQuest Technology Collection
ProQuest One
Linguistics Collection
Linguistics Database
ProQuest Central Korea
ProQuest Central Student
SciTech Premium Collection
ProQuest Computer Science Collection
Computer Science Database (ProQuest)
Advanced Technologies & Aerospace Database
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Premium
ProQuest One Academic (New)
Publicly Available Content Database
ProQuest One Academic Middle East (New)
ProQuest One Academic Eastern Edition (DO NOT USE)
ProQuest One Applied & Life Sciences
ProQuest One Academic
ProQuest One Academic UKI Edition
ProQuest Central China
ProQuest One Social Sciences
DOAJ Directory of Open Access Journals
DatabaseTitle CrossRef
Publicly Available Content Database
Computer Science Database
ProQuest Central Student
Technology Collection
ProQuest One Academic Middle East (New)
ProQuest Advanced Technologies & Aerospace Collection
ProQuest Central Essentials
ProQuest Computer Science Collection
ProQuest Central (Alumni Edition)
SciTech Premium Collection
ProQuest One Community College
ProQuest Central China
ProQuest Central
ProQuest One Applied & Life Sciences
Linguistics Collection
ProQuest Central Korea
ProQuest Central (New)
Advanced Technologies & Aerospace Collection
Social Science Premium Collection
ProQuest One Social Sciences
ProQuest One Academic Eastern Edition
Linguistics and Language Behavior Abstracts (LLBA)
ProQuest Technology Collection
ProQuest SciTech Collection
Advanced Technologies & Aerospace Database
ProQuest One Academic UKI Edition
Linguistics Database
ProQuest One Academic
ProQuest One Academic (New)
DatabaseTitleList
Publicly Available Content Database

CrossRef
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
– sequence: 2
  dbid: 8FG
  name: ProQuest Technology Collection
  url: https://search.proquest.com/technologycollection1
  sourceTypes: Aggregation Database
DeliveryMethod fulltext_linktorsrc
EISSN 2307-387X
EndPage 409
ExternalDocumentID oai_doaj_org_article_3a815d182a8440e1ac02f25f1d9da002
10_1162_tacl_a_00373
tacl_a_00373.pdf
GroupedDBID AAFWJ
AFPKN
ALMA_UNASSIGNED_HOLDINGS
EBS
GROUPED_DOAJ
JMNJE
M~E
OJV
OK1
RMI
AAYXX
ABUWG
AFKRA
ALSLI
ARAPS
BENPR
BGLVJ
CCPQU
CITATION
CPGLG
CRLPW
DWQXO
HCIFZ
K7-
PHGZM
PHGZT
PIMPY
7T9
8FE
8FG
AZQEC
GNUQQ
JQ2
P62
PKEHL
PQEST
PQGLB
PQQKQ
PQUKI
PRINS
PRQQA
PUEGO
ID FETCH-LOGICAL-c400t-47ec1762a315dab5abb2145b86e65a32146d30da40ff344da7e8e9c24f8b52fb3
IEDL.DBID BENPR
ISSN 2307-387X
IngestDate Wed Aug 27 01:30:29 EDT 2025
Fri Jul 25 22:10:59 EDT 2025
Tue Jul 01 03:28:35 EDT 2025
Thu Apr 24 22:52:35 EDT 2025
Sat Sep 30 12:10:37 EDT 2023
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c400t-47ec1762a315dab5abb2145b86e65a32146d30da40ff344da7e8e9c24f8b52fb3
Notes 2021
ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://www.proquest.com/docview/2893885921?pq-origsite=%requestingapplication%
PQID 2893885921
PQPubID 6535866
PageCount 19
ParticipantIDs doaj_primary_oai_doaj_org_article_3a815d182a8440e1ac02f25f1d9da002
proquest_journals_2893885921
crossref_citationtrail_10_1162_tacl_a_00373
mit_journals_10_1162_tacl_a_00373
crossref_primary_10_1162_tacl_a_00373
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2021-01-01
PublicationDateYYYYMMDD 2021-01-01
PublicationDate_xml – month: 01
  year: 2021
  text: 2021-01-01
  day: 01
PublicationDecade 2020
PublicationPlace One Rogers Street, Cambridge, MA 02142-1209, USA
PublicationPlace_xml – name: One Rogers Street, Cambridge, MA 02142-1209, USA
– name: Cambridge
PublicationTitle Transactions of the Association for Computational Linguistics
PublicationYear 2021
Publisher MIT Press
MIT Press Journals, The
The MIT Press
Publisher_xml – name: MIT Press
– name: MIT Press Journals, The
– name: The MIT Press
References Clark (2021060823395851900_bib6) 2019
Dang (2021060823395851900_bib9) 2008
Krippendorff (2021060823395851900_bib29) 2011
See (2021060823395851900_bib58) 2017
Chaganty (2021060823395851900_bib4) 2018
Gao (2021060823395851900_bib18) 2020
Bouscarrat (2021060823395851900_bib3) 2019
Papineni (2021060823395851900_bib47) 2002
Zhou (2021060823395851900_bib77) 2006
Yuxiang (2021060823395851900_bib69) 2018
Louis (2021060823395851900_bib40) 2013; 39
Zhou (2021060823395851900_bib78) 2018
Deutsch (2021060823395851900_bib12) 2020
Sutskever (2021060823395851900_bib62) 2014
Popović (2021060823395851900_bib52) 2015
Zhang (2021060823395851900_bib71) 2018
Wang (2021060823395851900_bib67) 2020
Graham (2021060823395851900_bib21) 2015
Ziegler (2021060823395851900_bib79) 2019
Kryściński (2021060823395851900_bib31) 2020
Narayan (2021060823395851900_bib44) 2018
Raffel (2021060823395851900_bib54) 2019
Peyrard (2021060823395851900_bib51) 2017
Cohan (2021060823395851900_bib7) 2016
Lin (2021060823395851900_bib36) 2004
Zhao (2021060823395851900_bib76) 2019
Li (2021060823395851900_bib14) 2019
Lewis (2021060823395851900_bib35) 2019
Sharma (2021060823395851900_bib60) 2019
Bahdanau (2021060823395851900_bib1) 2014
Kryściński (2021060823395851900_bib30) 2019
Hsu (2021060823395851900_bib26) 2018
Vedantam (2021060823395851900_bib65) 2015
Vinyals (2021060823395851900_bib66) 2015
Lavie (2021060823395851900_bib34) 2007
Liu (2021060823395851900_bib38) 2008
Ganesan (2021060823395851900_bib17) 2015
Zhang (2021060823395851900_bib74) 2018
Nallapati (2021060823395851900_bib43) 2016
Devlin (2021060823395851900_bib13) 2019
Stiennon (2021060823395851900_bib61) 2020
Peyrard (2021060823395851900_bib50) 2019
Durmus (2021060823395851900_bib16) 2020
Jiacheng (2021060823395851900_bib70) 2019
Jiang (2021060823395851900_bib27) 2018
Ng (2021060823395851900_bib45) 2015
Kedzie (2021060823395851900_bib28) 2018
Maynez (2021060823395851900_bib41) 2020
Scialom (2021060823395851900_bib57) 2019
Hardy (2021060823395851900_bib24) 2019
Zhang (2021060823395851900_bib73) 2020
Lin (2021060823395851900_bib37) 2004
Guo (2021060823395851900_bib23) 2018
Sandhaus (2021060823395851900_bib56) 2008; 6
Kryściński (2021060823395851900_bib32) 2018
Dong (2021060823395851900_bib15) 2018
Dang (2021060823395851900_bib10) 2009
Owczarzak (2021060823395851900_bib46) 2012
Rankel (2021060823395851900_bib55) 2013
Hermann (2021060823395851900_bib25) 2015
Pasunuru (2021060823395851900_bib48) 2018
Dang (2021060823395851900_bib8) 2005
Vaswani (2021060823395851900_bib64) 2017
Vasilyev (2021060823395851900_bib63) 2020
Zhang (2021060823395851900_bib72) 2019
Liu (2021060823395851900_bib39) 2019
Zhang (2021060823395851900_bib75) 2019
Mikolov (2021060823395851900_bib42) 2013
Chen (2021060823395851900_bib5) 2018
Williams (2021060823395851900_bib68) 1992; 8
Böhm (2021060823395851900_bib2) 2019
Radford (2021060823395851900_bib53) 2019; 1
Grusky (2021060823395851900_bib22) 2018
Gillick (2021060823395851900_bib20) 2010
Paulus (2021060823395851900_bib49) 2017
Kusner (2021060823395851900_bib33) 2015
Gehrmann (2021060823395851900_bib19) 2018
Dernoncourt (2021060823395851900_bib11) 2018
ShafieiBavani (2021060823395851900_bib59) 2018
References_xml – start-page: 708
  volume-title: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
  year: 2018
  ident: 2021060823395851900_bib22
  article-title: Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies
– year: 2019
  ident: 2021060823395851900_bib35
  article-title: BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension
  publication-title: arXiv preprint arXiv:1910.13461
– volume-title: NTCIR
  year: 2004
  ident: 2021060823395851900_bib36
  article-title: Looking for a few good metrics: Automatic summarization evaluation-how many samples are enough?
– start-page: 243
  volume-title: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop
  year: 2019
  ident: 2021060823395851900_bib3
  article-title: STRASS: A light and effective method for extractive summarization based on sentence embeddings
  doi: 10.18653/v1/P19-2034
– start-page: 1
  volume-title: Proceedings of the document understanding conference
  year: 2005
  ident: 2021060823395851900_bib8
  article-title: Overview of DUC 2005
– start-page: 148
  volume-title: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk
  year: 2010
  ident: 2021060823395851900_bib20
  article-title: Non-expert evaluation of summarization systems is risky
– start-page: 447
  volume-title: Proceedings of the Human Language Technology Conference of the NAACL, Main Conference
  year: 2006
  ident: 2021060823395851900_bib77
  article-title: ParaEval: Using paraphrases to evaluate summaries automatically
– start-page: 128
  volume-title: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
  year: 2015
  ident: 2021060823395851900_bib21
  article-title: Re-evaluating automatic summarization with BLEU and 192 shades of ROUGE
  doi: 10.18653/v1/D15-1013
– start-page: 74
  volume-title: Text Summarization Branches Out
  year: 2004
  ident: 2021060823395851900_bib37
  article-title: ROUGE: A package for automatic evaluation of summaries
– volume: 1
  start-page: 9
  issue: 8
  year: 2019
  ident: 2021060823395851900_bib53
  article-title: Language models are unsupervised multitask learners
  publication-title: OpenAI Blog
– start-page: 392
  volume-title: Proceedings of the Tenth Workshop on Statistical Machine Translation
  year: 2015
  ident: 2021060823395851900_bib52
  article-title: chrF: character n-gram F-score for automatic MT evaluation
  doi: 10.18653/v1/W15-3049
– start-page: 3110
  volume-title: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
  year: 2019
  ident: 2021060823395851900_bib2
  article-title: Better rewards yield better summaries: Learning to summarise without references
  doi: 10.18653/v1/D19-1307
– start-page: 132
  volume-title: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
  year: 2018
  ident: 2021060823395851900_bib26
  article-title: A unified model for extractive and abstractive summarization using inconsistency loss
  doi: 10.18653/v1/P18-1013
– start-page: 4067
  volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
  year: 2018
  ident: 2021060823395851900_bib27
  article-title: Closed-book training to improve summarization encoder memory
  doi: 10.18653/v1/D18-1440
– start-page: 9332
  volume-title: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)
  year: 2020
  ident: 2021060823395851900_bib31
  article-title: Evaluating the factual consistency of abstractive text summarization
  doi: 10.18653/v1/2020.emnlp-main.750
– start-page: 646
  volume-title: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers)
  year: 2018
  ident: 2021060823395851900_bib48
  article-title: Multi-reward reinforced summarization with saliency and entailment
– start-page: 3111
  volume-title: Advances in Neural Information Processing Systems 26
  year: 2013
  ident: 2021060823395851900_bib42
  article-title: Distributed representations of words and phrases and their compositionality
– start-page: 1925
  volume-title: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing
  year: 2015
  ident: 2021060823395851900_bib45
  article-title: Better summarization evaluation with word embeddings for ROUGE
  doi: 10.18653/v1/D15-1222
– volume: 8
  start-page: 229
  issue: 3–4
  year: 1992
  ident: 2021060823395851900_bib68
  article-title: Simple statistical gradient-following algorithms for connectionist reinforcement learning
  publication-title: Machine Learning
  doi: 10.1007/BF00992696
– year: 2019
  ident: 2021060823395851900_bib54
  article-title: Exploring the limits of transfer learning with a unified text-to-text transformer
  publication-title: arXiv e-prints
– year: 2020
  ident: 2021060823395851900_bib12
  article-title: SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics
  doi: 10.18653/v1/2020.nlposs-1.17
– volume-title: Proceedings of the Text Analysis Conference
  year: 2009
  ident: 2021060823395851900_bib10
  article-title: Overview of the TAC 2009 summarization track
– start-page: 5093
  volume-title: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
  year: 2019
  ident: 2021060823395851900_bib50
  article-title: Studying summarization evaluation metrics in the appropriate scoring range
  doi: 10.18653/v1/P19-1502
– start-page: 563
  volume-title: EMNLP-IJCNLP 2019
  year: 2019
  ident: 2021060823395851900_bib76
  article-title: MoverScore: Text generation evaluating with contextualized embeddings and earth mover distance
– start-page: 643
  volume-title: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
  year: 2018
  ident: 2021060823395851900_bib4
  article-title: The price of debiasing automatic metrics in natural language evaluation
  doi: 10.18653/v1/P18-1060
– volume-title: Thirty-Second AAAI Conference on Artificial Intelligence
  year: 2018
  ident: 2021060823395851900_bib69
  article-title: Learning to extract coherent summary via deep reinforcement learning
– start-page: 4566
  volume-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
  year: 2015
  ident: 2021060823395851900_bib65
  article-title: CIDEr: Consensus-based image description evaluation
– start-page: 4171
  volume-title: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)
  year: 2019
  ident: 2021060823395851900_bib13
  article-title: BERT: Pre-training of deep bidirectional transformers for language understanding
– volume: 39
  start-page: 267
  issue: 2
  year: 2013
  ident: 2021060823395851900_bib40
  article-title: Automatically assessing machine summary content without a gold standard
  publication-title: Computational Linguistics
  doi: 10.1162/COLI_a_00123
– start-page: 785
  volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
  year: 2018
  ident: 2021060823395851900_bib71
  article-title: On the abstractiveness of neural document summarization
  doi: 10.18653/v1/D18-1089
– start-page: 131
  volume-title: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)
  year: 2013
  ident: 2021060823395851900_bib55
  article-title: A decade of automatic content evaluation of news summaries: Reassessing the state of the art
– start-page: 2748
  volume-title: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
  year: 2019
  ident: 2021060823395851900_bib6
  article-title: Sentence mover’s similarity: Automatic evaluation for multi-sentence texts
  doi: 10.18653/v1/P19-1264
– volume-title: TAC
  year: 2008
  ident: 2021060823395851900_bib9
  article-title: Overview of the TAC 2008 update summarization task.
– volume-title: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018)
  year: 2018
  ident: 2021060823395851900_bib11
  article-title: A repository of corpora for summarization
– year: 2017
  ident: 2021060823395851900_bib49
  article-title: A deep reinforced model for abstractive summarization
  publication-title: arXiv preprint arXiv:1705.04304
– start-page: 1347
  volume-title: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020
  year: 2020
  ident: 2021060823395851900_bib18
  article-title: SUPERT: towards new frontiers in unsupervised evaluation metrics for multi-document summarization
– start-page: 1808
  volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
  year: 2018
  ident: 2021060823395851900_bib32
  article-title: Improving abstraction in text summarization
  doi: 10.18653/v1/D18-1207
– start-page: 1818
  volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
  year: 2018
  ident: 2021060823395851900_bib28
  article-title: Content selection in deep learning models of summarization
  doi: 10.18653/v1/D18-1208
– year: 2011
  ident: 2021060823395851900_bib29
  article-title: Computing krippendorff’s alpha-reliability
– start-page: 13042
  volume-title: Advances in Neural Information Processing Systems
  year: 2019
  ident: 2021060823395851900_bib14
  article-title: Unified language model pre-training for natural language understanding and generation
– start-page: 4098
  volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
  year: 2018
  ident: 2021060823395851900_bib19
  article-title: Bottom-up abstractive summarization
  doi: 10.18653/v1/D18-1443
– year: 2019
  ident: 2021060823395851900_bib79
  article-title: Fine-tuning language models from human preferences
  publication-title: arXiv preprint arXiv:1909.08593
– start-page: 1693
  volume-title: Advances in Neural Information Processing Systems
  year: 2015
  ident: 2021060823395851900_bib25
  article-title: Teaching machines to read and comprehend
– start-page: 3730
  volume-title: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
  year: 2019
  ident: 2021060823395851900_bib39
  article-title: Text summarization with pretrained encoders
– start-page: 3292
  volume-title: EMNLP-IJCNLP 2019
  year: 2019
  ident: 2021060823395851900_bib70
  article-title: Neural extractive text summarization with syntactic compression
– start-page: 806
  volume-title: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16)
  year: 2016
  ident: 2021060823395851900_bib7
  article-title: Revisiting summarization evaluation for scientific articles
– start-page: 1073
  volume-title: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
  year: 2017
  ident: 2021060823395851900_bib58
  article-title: Get to the point: Summarization with pointer-generator networks
  doi: 10.18653/v1/P17-1099
– start-page: 654
  volume-title: ACL 2018
  year: 2018
  ident: 2021060823395851900_bib78
  article-title: Neural document summarization by jointly learning to score and select sentences
– start-page: 311
  volume-title: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics
  year: 2002
  ident: 2021060823395851900_bib47
  article-title: BLEU: A method for automatic evaluation of machine translation
– start-page: 5998
  volume-title: Advances in Neural Information Processing Systems
  year: 2017
  ident: 2021060823395851900_bib64
  article-title: Attention is all you need
– year: 2020
  ident: 2021060823395851900_bib63
  article-title: Fill in the BLANC: human-free quality estimation of document summaries
  publication-title: CoRR
– start-page: 687
  volume-title: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
  year: 2018
  ident: 2021060823395851900_bib23
  article-title: Soft layer-specific multi-task summarization with entailment and question generation
  doi: 10.18653/v1/P18-1064
– start-page: 74
  volume-title: Proceedings of the Workshop on New Frontiers in Summarization
  year: 2017
  ident: 2021060823395851900_bib51
  article-title: Learning to score system summaries for better content selection evaluation.
  doi: 10.18653/v1/W17-4510
– start-page: 3381
  volume-title: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
  year: 2019
  ident: 2021060823395851900_bib24
  article-title: HighRES: Highlight-based reference-less evaluation of summarization
  doi: 10.18653/v1/P19-1330
– start-page: 359
  volume-title: The 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8–14, 2012, Jeju Island, Korea - Volume 2: Short Papers
  year: 2012
  ident: 2021060823395851900_bib46
  article-title: Assessing the effect of inconsistent assessors on summarization evaluation
– start-page: 228
  volume-title: Proceedings of the Second Workshop on Statistical Machine Translation
  year: 2007
  ident: 2021060823395851900_bib34
  article-title: METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments
  doi: 10.3115/1626355.1626389
– year: 2016
  ident: 2021060823395851900_bib43
  article-title: Abstractive text summarization using sequence-to-sequence rnns and beyond
  publication-title: arXiv preprint arXiv: 1602.06023
– year: 2020
  ident: 2021060823395851900_bib61
  article-title: Learning to summarize from human feedback
  publication-title: CoRR
– start-page: 5059
  volume-title: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics
  year: 2019
  ident: 2021060823395851900_bib75
  article-title: HIBERT: Document level pre-training of hierarchical bidirectional transformers for document summarization
  doi: 10.18653/v1/P19-1499
– start-page: 1747
  volume-title: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers)
  year: 2018
  ident: 2021060823395851900_bib44
  article-title: Ranking sentences for extractive summarization with reinforcement learning
– start-page: 540
  volume-title: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
  year: 2019
  ident: 2021060823395851900_bib30
  article-title: Neural text summarization: A critical evaluation
  doi: 10.18653/v1/D19-1051
– start-page: 779
  volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
  year: 2018
  ident: 2021060823395851900_bib74
  article-title: Neural latent extractive document summarization
  doi: 10.18653/v1/D18-1088
– start-page: 3739
  volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
  year: 2018
  ident: 2021060823395851900_bib15
  article-title: BanditSum: Extractive summarization as a contextual bandit
  doi: 10.18653/v1/D18-1409
– start-page: 1906
  volume-title: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 2020
  year: 2020
  ident: 2021060823395851900_bib41
  article-title: On faithfulness and factuality in abstractive summarization
– start-page: 5008
  volume-title: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
  year: 2020
  ident: 2021060823395851900_bib67
  article-title: Asking and answering questions to evaluate the factual consistency of summaries
  doi: 10.18653/v1/2020.acl-main.450
– volume-title: International Conference on Learning Representations
  year: 2020
  ident: 2021060823395851900_bib73
  article-title: Bertscore: Evaluating text generation with BERT
– start-page: 957
  volume-title: International Conference on Machine Learning
  year: 2015
  ident: 2021060823395851900_bib33
  article-title: From word embeddings to document distances
– start-page: 2692
  volume-title: Advances in Neural Information Processing Systems
  year: 2015
  ident: 2021060823395851900_bib66
  article-title: Pointer networks
– year: 2015
  ident: 2021060823395851900_bib17
  article-title: Rouge 2.0: Updated and improved measures for evaluation of summarization tasks
– start-page: 3104
  volume-title: Advances in Neural Information processing Systems
  year: 2014
  ident: 2021060823395851900_bib62
  article-title: Sequence to sequence learning with neural networks
– year: 2014
  ident: 2021060823395851900_bib1
  article-title: Neural machine translation by jointly learning to align and translate
  publication-title: arXiv preprint arXiv:1409.0473
– start-page: 201
  volume-title: Proceedings of ACL-08: HLT, Short Papers
  year: 2008
  ident: 2021060823395851900_bib38
  article-title: Correlation between ROUGE and human evaluation of extractive meeting summaries
– volume: 6
  start-page: e26752
  issue: 12
  year: 2008
  ident: 2021060823395851900_bib56
  article-title: The New York Times annotated corpus
  publication-title: Linguistic Data Consortium, Philadelphia
– start-page: 3246
  volume-title: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
  year: 2019
  ident: 2021060823395851900_bib57
  article-title: Answers unite! unsupervised metrics for reinforced summarization models
  doi: 10.18653/v1/D19-1320
– start-page: 3280
  volume-title: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)
  year: 2019
  ident: 2021060823395851900_bib60
  article-title: An entity-driven framework for abstractive summarization
  doi: 10.18653/v1/D19-1323
– start-page: 5055
  volume-title: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics
  year: 2020
  ident: 2021060823395851900_bib16
  article-title: FEQA: A question answering evaluation framework for faithfulness assessment in abstractive summarization
  doi: 10.18653/v1/2020.acl-main.454
– start-page: 762
  volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing
  year: 2018
  ident: 2021060823395851900_bib59
  article-title: A graph-theoretic summary evaluation for ROUGE
  doi: 10.18653/v1/D18-1085
– year: 2019
  ident: 2021060823395851900_bib72
  article-title: Pegasus: Pre-training with extracted gap-sentences for abstractive summarization
  publication-title: arXiv preprint arXiv:1912.08777
– start-page: 675
  volume-title: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)
  year: 2018
  ident: 2021060823395851900_bib5
  article-title: Fast abstractive summarization with reinforce-selected sentence rewriting
  doi: 10.18653/v1/P18-1063
SSID ssj0001818062
Score 2.6433
Snippet The scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols continue...
AbstractThe scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols...
SourceID doaj
proquest
crossref
mit
SourceType Open Website
Aggregation Database
Enrichment Source
Index Database
Publisher
StartPage 391
SubjectTerms Annotations
Automatic summarization
Crowdsourcing
Datasets
Linguistics
Summaries
SummonAdditionalLinks – databaseName: DOAJ Directory of Open Access Journals
  dbid: DOA
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LS8QwEA6yJy-iqLi6SgU9Sdk2r6beVHZZBD2oC3srmTxEWFfR-v-dpO26IuLFSw_plCQzTebBzDeEnFhRKs80OjkZFCkvJKSlsZCCoUpqJwzE8rGbWzmZ8uuZmK20-go5YQ08cMO4IdMqFxatYK04z1yuTUY9FT63pdUtjCTqvBVnKkZXQgmzpF2mu6TDWpt5pQMUZsG-6aAI1Y-a5fmp_nEfRyUz3iQbrXWYXDSr2iJrbrFNshCpQot3fp7cubSD5148Jvex8KwtpExGS9zuHTIdjx6uJmnb6CA1eIRqZJEzOd5KmuFWNQgNEADEQUknhQ6thKRlmdU8855xbnXhlCsN5V4hKz2wXdJbvCzcHkmU4T53ZWEpQHgNCp_ee8nxCwlln5x1W69MiwIemlHMq-gNSFqtMqpPTpfUrw36xS90l4GLS5qAWR0HUJJVK8nqL0n2yTHKoGrP0PsvEw06CX0Ros_IlBIlzff_Yx0HZJ2GxJUYZxmQXv324Q7R8qjhKP5kn5_t1kI
  priority: 102
  providerName: Directory of Open Access Journals
Title SummEval: Re-evaluating Summarization Evaluation
URI https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00373
https://www.proquest.com/docview/2893885921
https://doaj.org/article/3a815d182a8440e1ac02f25f1d9da002
Volume 9
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3NS8MwFA-6XbyIouJ0jgp6kmKbpmnqRZxsDsEh6mC3ks8hzG1u9f_3JUs3Realh-aVkpfkfeW930PoQqU5MwkHJycSWUgyKsJcKhEKiRnlOpXClY899WlvQB6H6dAH3BY-rbKSiU5Qq6m0MfJrcAwSxtIcx7ezz9B2jbK3q76Fxjaqgwhm4HzV253-88s6ymJLmV1XUZvwbIFkh1X2O8XXJZfjglt4zCz5pZccfD9om4_38o-Mdoqnu4d2vcUY3C2XeB9t6ckBimz0Cqzg8U3wosMKsnsyCl5dMZovrgw6KyzvQzTodt7ue6FvfhBKOFYlsE3LGCQVT-JUcZFyISyouGBU05Tb9kJUJZHiJDImIUTxTDOdS0wMA_YakRyh2mQ60ccoYJKYWOeZwkLYYcHgaYyhBL6gIm-gq2rqhfTI4LZBxbhwHgLFxU9GNdDlinq2RMTYQNe2XFzRWBxr92I6HxX-WBQJZzA98HE4IyTSMZcRNjg1scoVB2HdQOewBoU_V4sNP2pWK7QmXG-Xk_-HT9EOtmkqLqrSRLVy_qXPwM4oRQtts-5Dy2-plvPWvwHEX9Oe
linkProvider ProQuest
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1LT9wwEB5ROLQXCqJVl0JJJThVgcSxHadSD-WlBRaEeKh7C36iqtvdCoIq-lf4K_w4xt5kF6joDamXHOJJ_BrPeEYz3wAsG1YIl0k0chKVxzTnKi60UbHSRHBpmVYhfWz_gLdP6W6XdSfgpsmF8WGVjUwMgtoMtPeRr6FhkAnBCpLWEZR79vo32meXX3Y2cTNXCNneOtlox3UJgVgjc1bYudUpnneZpcxIxaRSHppbCW45k75IDzdZYiRNnMsoNTK3whaaUCdwkE5l-N8XMIVWBcNTP7Vx1Dn8Nnbh-DzpULLUR1N7lNpuE1rPyVolda-UHnszzx4ovVAbAFXZz-_VXwogaLXt13DbrMcwmOXH6lWlVvWfR1CR_-mCzcB0fZuOvg7ZfxYmbH8OEu_ZQwuh9zk6snEDZ94_j45Dol6deBptjXDO38DpswzyLUz2B337DiKhqUttkRuilG9WAp_OOU7xC66KFnxqdq7UNWq6L97RK4P1xEl5f59bsDKi_jVEC3mCbt0zwYjGY3yHF4OL87IWGWUmBU4P7T8pKE1sKnVCHGEuNYWRqMha8BFZqKxlzuUTHS00zDEmHHPG_L-bl-Bl-2S_U3Z2Dvbewyviw3mC92kBJquLK7uI97FKfajPRQRnz81Zd8iaQzw
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SummEval%3A+Re-evaluating+Summarization+Evaluation&rft.jtitle=Transactions+of+the+Association+for+Computational+Linguistics&rft.au=Fabbri%2C+Alexander+R&rft.au=Kry%C5%9Bci%C5%84ski%2C+Wojciech&rft.au=McCann%2C+Bryan&rft.au=Xiong%2C+Caiming&rft.date=2021-01-01&rft.pub=MIT+Press+Journals%2C+The&rft.issn=2307-387X&rft.eissn=2307-387X&rft.volume=9&rft.spage=391&rft_id=info:doi/10.1162%2Ftacl_a_00373
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2307-387X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2307-387X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2307-387X&client=summon