SummEval: Re-evaluating Summarization Evaluation
The scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols continue to inhibit progress. We address the existing shortcomings of summarization evaluation methods along five dimensions: 1) we re-evaluate 14 au...
Saved in:
Published in | Transactions of the Association for Computational Linguistics Vol. 9; pp. 391 - 409 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
One Rogers Street, Cambridge, MA 02142-1209, USA
MIT Press
01.01.2021
MIT Press Journals, The The MIT Press |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | The scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols continue to inhibit progress. We address the existing shortcomings of summarization evaluation methods along five dimensions: 1) we re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion using neural summarization model outputs along with expert and crowd-sourced human annotations; 2) we consistently benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics; 3) we assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset and share it in a unified format; 4) we implement and share a toolkit that provides an extensible and unified API for evaluating summarization models across a broad range of automatic metrics; and 5) we assemble and share the largest and most diverse, in terms of model types, collection of human judgments of model-generated summaries on the CNN/Daily Mail dataset annotated by both expert judges and crowd-source workers. We hope that this work will help promote a more complete evaluation protocol for text summarization as well as advance research in developing evaluation metrics that better correlate with human judgments. |
---|---|
AbstractList | AbstractThe scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols continue to inhibit progress. We address the existing shortcomings of summarization evaluation methods along five dimensions: 1) we re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion using neural summarization model outputs along with expert and crowd-sourced human annotations; 2) we consistently benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics; 3) we assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset and share it in a unified format; 4) we implement and share a toolkit that provides an extensible and unified API for evaluating summarization models across a broad range of automatic metrics; and 5) we assemble and share the largest and most diverse, in terms of model types, collection of human judgments of model-generated summaries on the CNN/Daily Mail dataset annotated by both expert judges and crowd-source workers. We hope that this work will help promote a more complete evaluation protocol for text summarization as well as advance research in developing evaluation metrics that better correlate with human judgments. The scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols continue to inhibit progress. We address the existing shortcomings of summarization evaluation methods along five dimensions: 1) we re-evaluate 14 automatic evaluation metrics in a comprehensive and consistent fashion using neural summarization model outputs along with expert and crowd-sourced human annotations; 2) we consistently benchmark 23 recent summarization models using the aforementioned automatic evaluation metrics; 3) we assemble the largest collection of summaries generated by models trained on the CNN/DailyMail news dataset and share it in a unified format; 4) we implement and share a toolkit that provides an extensible and unified API for evaluating summarization models across a broad range of automatic metrics; and 5) we assemble and share the largest and most diverse, in terms of model types, collection of human judgments of model-generated summaries on the CNN/Daily Mail dataset annotated by both expert judges and crowd-source workers. We hope that this work will help promote a more complete evaluation protocol for text summarization as well as advance research in developing evaluation metrics that better correlate with human judgments. |
Author | Fabbri, Alexander R. Xiong, Caiming Kryściński, Wojciech Radev, Dragomir McCann, Bryan Socher, Richard |
Author_xml | – sequence: 1 givenname: Alexander R. surname: Fabbri fullname: Fabbri, Alexander R. email: alexander.fabbri@yale.edu organization: Yale University, United States. alexander.fabbri@yale.edu – sequence: 2 givenname: Wojciech surname: Kryściński fullname: Kryściński, Wojciech organization: Salesforce Research, United States. kryscinski@salesforce.com – sequence: 3 givenname: Bryan surname: McCann fullname: McCann, Bryan email: bryan.mccann.is@gmail.com organization: Salesforce Research, United States. bryan.mccann.is@gmail.com – sequence: 4 givenname: Caiming surname: Xiong fullname: Xiong, Caiming email: cxiong@salesforce.com organization: Salesforce Research, United States. cxiong@salesforce.com – sequence: 5 givenname: Richard surname: Socher fullname: Socher, Richard email: richard@socher.org organization: Salesforce Research, United States. richard@socher.org – sequence: 6 givenname: Dragomir surname: Radev fullname: Radev, Dragomir email: dragomir.radev@yale.edu organization: Salesforce Research, United States. dragomir.radev@yale.edu |
BookMark | eNp1kF1LwzAUhoNMcM7d-QMG3nhh9eSjbeqdyNTBQPADvAunbTIyumamraC_3mydMESv8ubkOe95c47JoHa1JuSUwiWlCbtqsagUKgCe8gMyZBzSiMv0bbCnj8i4aZYAQCWVkLAhgedutZp-YHU9edKRDqLD1taLyaaO3n6Fm6sn092Dq0_IocGq0ePdOSKvd9OX24do_ng_u72ZR4UAaCOR6oKmCUNO4xLzGPOcURHnMtFJjDzopORQogBjuBAlplrqrGDCyDxmJucjMut9S4dLtfY2pPlUDq3aFpxfKPStLSqtOMowhEqGUgjQFAtghsWGllmJACx4nfVea-_eO920auk6X4f4ismMSxlnjAaK9VThXdN4bVRh2-2fW4-2UhTUZtFqf9Gh6eJX00_Uf_DzHl_ZvRB_ot95mI50 |
CitedBy_id | crossref_primary_10_3934_aci_2024001 crossref_primary_10_1017_S1351324923000177 crossref_primary_10_1016_j_jbi_2023_104358 crossref_primary_10_1134_S1995080223080115 crossref_primary_10_1162_coli_a_00502 crossref_primary_10_1109_TKDE_2024_3509715 crossref_primary_10_1007_s10506_023_09349_8 crossref_primary_10_1145_3652951 crossref_primary_10_1016_j_engappai_2024_108231 crossref_primary_10_1016_j_neunet_2024_106417 crossref_primary_10_1162_tacl_a_00576 crossref_primary_10_1162_tacl_a_00417 crossref_primary_10_1007_s10115_024_02217_0 crossref_primary_10_2196_68998 crossref_primary_10_1162_tacl_a_00453 crossref_primary_10_1162_tacl_a_00695 crossref_primary_10_3389_frai_2023_1223924 crossref_primary_10_1016_j_eswa_2024_124456 crossref_primary_10_1007_s10462_023_10582_5 crossref_primary_10_1016_j_infsof_2022_106922 crossref_primary_10_1145_3584700 crossref_primary_10_21603_2782_4799_2024_3_3_203_222 crossref_primary_10_1145_3527546_3527561 crossref_primary_10_1016_j_ijmedinf_2024_105443 crossref_primary_10_1016_j_knosys_2024_112570 crossref_primary_10_1145_3703155 crossref_primary_10_1109_TSE_2021_3136169 crossref_primary_10_1016_j_eswa_2023_121364 crossref_primary_10_1186_s40537_024_00950_5 crossref_primary_10_3390_info14060303 crossref_primary_10_1016_S2589_7500_24_00111_0 crossref_primary_10_1016_j_jbi_2023_104533 crossref_primary_10_1080_09544828_2023_2301230 crossref_primary_10_36548_jei_2021_4_006 crossref_primary_10_1016_j_aei_2022_101649 crossref_primary_10_1162_tacl_a_00506 crossref_primary_10_1038_s41746_024_01091_y crossref_primary_10_1007_s10278_024_00985_3 crossref_primary_10_1162_tacl_a_00632 crossref_primary_10_3390_info14040250 crossref_primary_10_1145_3485766 crossref_primary_10_6339_24_JDS1149 crossref_primary_10_1109_ACCESS_2024_3377463 crossref_primary_10_1016_j_eswa_2025_127234 crossref_primary_10_1007_s13278_024_01323_9 crossref_primary_10_1109_ACCESS_2023_3292300 crossref_primary_10_3389_frai_2024_1375419 crossref_primary_10_1145_3529754 crossref_primary_10_1145_3583558 crossref_primary_10_1016_j_jcmg_2024_05_021 crossref_primary_10_2903_sp_efsa_2023_EN_8223 crossref_primary_10_3389_frai_2024_1200949 crossref_primary_10_3390_informatics10010005 crossref_primary_10_1109_ACCESS_2022_3197769 crossref_primary_10_1051_e3sconf_202561903005 crossref_primary_10_7232_JKIIE_2024_50_2_097 crossref_primary_10_1016_j_jbi_2024_104640 crossref_primary_10_1016_j_knosys_2025_112969 crossref_primary_10_1038_s41746_023_00896_7 crossref_primary_10_1038_s41746_024_01239_w crossref_primary_10_1145_3597307 crossref_primary_10_1162_tacl_a_00702 crossref_primary_10_1162_tacl_a_00703 crossref_primary_10_3390_app14020713 crossref_primary_10_1142_S0218213024500179 crossref_primary_10_1016_j_nlp_2024_100080 crossref_primary_10_1016_j_eswa_2025_126592 crossref_primary_10_1109_ACCESS_2023_3322226 crossref_primary_10_1162_coli_a_00519 crossref_primary_10_1093_database_baad031 crossref_primary_10_1162_tacl_a_00583 |
Cites_doi | 10.18653/v1/P19-2034 10.18653/v1/D15-1013 10.18653/v1/W15-3049 10.18653/v1/D19-1307 10.18653/v1/P18-1013 10.18653/v1/D18-1440 10.18653/v1/2020.emnlp-main.750 10.18653/v1/D15-1222 10.1007/BF00992696 10.18653/v1/2020.nlposs-1.17 10.18653/v1/P19-1502 10.18653/v1/P18-1060 10.1162/COLI_a_00123 10.18653/v1/D18-1089 10.18653/v1/P19-1264 10.18653/v1/D18-1207 10.18653/v1/D18-1208 10.18653/v1/D18-1443 10.18653/v1/P17-1099 10.18653/v1/P18-1064 10.18653/v1/W17-4510 10.18653/v1/P19-1330 10.3115/1626355.1626389 10.18653/v1/P19-1499 10.18653/v1/D19-1051 10.18653/v1/D18-1088 10.18653/v1/D18-1409 10.18653/v1/2020.acl-main.450 10.18653/v1/D19-1320 10.18653/v1/D19-1323 10.18653/v1/2020.acl-main.454 10.18653/v1/D18-1085 10.18653/v1/P18-1063 |
ContentType | Journal Article |
Copyright | 2021. This work is published under https://creativecommons.org/licenses/by/4.0/legalcode (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
Copyright_xml | – notice: 2021. This work is published under https://creativecommons.org/licenses/by/4.0/legalcode (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. |
DBID | AAYXX CITATION 7T9 8FE 8FG ABUWG AFKRA ALSLI ARAPS AZQEC BENPR BGLVJ CCPQU CPGLG CRLPW DWQXO GNUQQ HCIFZ JQ2 K7- P5Z P62 PHGZM PHGZT PIMPY PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PRQQA DOA |
DOI | 10.1162/tacl_a_00373 |
DatabaseName | CrossRef Linguistics and Language Behavior Abstracts (LLBA) ProQuest SciTech Collection ProQuest Technology Collection ProQuest Central (Alumni) ProQuest Central UK/Ireland Social Science Premium Collection Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Central ProQuest Technology Collection ProQuest One Linguistics Collection Linguistics Database ProQuest Central Korea ProQuest Central Student SciTech Premium Collection ProQuest Computer Science Collection Computer Science Database (ProQuest) Advanced Technologies & Aerospace Database ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Premium ProQuest One Academic (New) Publicly Available Content Database ProQuest One Academic Middle East (New) ProQuest One Academic Eastern Edition (DO NOT USE) ProQuest One Applied & Life Sciences ProQuest One Academic ProQuest One Academic UKI Edition ProQuest Central China ProQuest One Social Sciences DOAJ Directory of Open Access Journals |
DatabaseTitle | CrossRef Publicly Available Content Database Computer Science Database ProQuest Central Student Technology Collection ProQuest One Academic Middle East (New) ProQuest Advanced Technologies & Aerospace Collection ProQuest Central Essentials ProQuest Computer Science Collection ProQuest Central (Alumni Edition) SciTech Premium Collection ProQuest One Community College ProQuest Central China ProQuest Central ProQuest One Applied & Life Sciences Linguistics Collection ProQuest Central Korea ProQuest Central (New) Advanced Technologies & Aerospace Collection Social Science Premium Collection ProQuest One Social Sciences ProQuest One Academic Eastern Edition Linguistics and Language Behavior Abstracts (LLBA) ProQuest Technology Collection ProQuest SciTech Collection Advanced Technologies & Aerospace Database ProQuest One Academic UKI Edition Linguistics Database ProQuest One Academic ProQuest One Academic (New) |
DatabaseTitleList | Publicly Available Content Database CrossRef |
Database_xml | – sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website – sequence: 2 dbid: 8FG name: ProQuest Technology Collection url: https://search.proquest.com/technologycollection1 sourceTypes: Aggregation Database |
DeliveryMethod | fulltext_linktorsrc |
EISSN | 2307-387X |
EndPage | 409 |
ExternalDocumentID | oai_doaj_org_article_3a815d182a8440e1ac02f25f1d9da002 10_1162_tacl_a_00373 tacl_a_00373.pdf |
GroupedDBID | AAFWJ AFPKN ALMA_UNASSIGNED_HOLDINGS EBS GROUPED_DOAJ JMNJE M~E OJV OK1 RMI AAYXX ABUWG AFKRA ALSLI ARAPS BENPR BGLVJ CCPQU CITATION CPGLG CRLPW DWQXO HCIFZ K7- PHGZM PHGZT PIMPY 7T9 8FE 8FG AZQEC GNUQQ JQ2 P62 PKEHL PQEST PQGLB PQQKQ PQUKI PRINS PRQQA PUEGO |
ID | FETCH-LOGICAL-c400t-47ec1762a315dab5abb2145b86e65a32146d30da40ff344da7e8e9c24f8b52fb3 |
IEDL.DBID | BENPR |
ISSN | 2307-387X |
IngestDate | Wed Aug 27 01:30:29 EDT 2025 Fri Jul 25 22:10:59 EDT 2025 Tue Jul 01 03:28:35 EDT 2025 Thu Apr 24 22:52:35 EDT 2025 Sat Sep 30 12:10:37 EDT 2023 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c400t-47ec1762a315dab5abb2145b86e65a32146d30da40ff344da7e8e9c24f8b52fb3 |
Notes | 2021 ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
OpenAccessLink | https://www.proquest.com/docview/2893885921?pq-origsite=%requestingapplication% |
PQID | 2893885921 |
PQPubID | 6535866 |
PageCount | 19 |
ParticipantIDs | doaj_primary_oai_doaj_org_article_3a815d182a8440e1ac02f25f1d9da002 proquest_journals_2893885921 crossref_citationtrail_10_1162_tacl_a_00373 mit_journals_10_1162_tacl_a_00373 crossref_primary_10_1162_tacl_a_00373 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | 2021-01-01 |
PublicationDateYYYYMMDD | 2021-01-01 |
PublicationDate_xml | – month: 01 year: 2021 text: 2021-01-01 day: 01 |
PublicationDecade | 2020 |
PublicationPlace | One Rogers Street, Cambridge, MA 02142-1209, USA |
PublicationPlace_xml | – name: One Rogers Street, Cambridge, MA 02142-1209, USA – name: Cambridge |
PublicationTitle | Transactions of the Association for Computational Linguistics |
PublicationYear | 2021 |
Publisher | MIT Press MIT Press Journals, The The MIT Press |
Publisher_xml | – name: MIT Press – name: MIT Press Journals, The – name: The MIT Press |
References | Clark (2021060823395851900_bib6) 2019 Dang (2021060823395851900_bib9) 2008 Krippendorff (2021060823395851900_bib29) 2011 See (2021060823395851900_bib58) 2017 Chaganty (2021060823395851900_bib4) 2018 Gao (2021060823395851900_bib18) 2020 Bouscarrat (2021060823395851900_bib3) 2019 Papineni (2021060823395851900_bib47) 2002 Zhou (2021060823395851900_bib77) 2006 Yuxiang (2021060823395851900_bib69) 2018 Louis (2021060823395851900_bib40) 2013; 39 Zhou (2021060823395851900_bib78) 2018 Deutsch (2021060823395851900_bib12) 2020 Sutskever (2021060823395851900_bib62) 2014 Popović (2021060823395851900_bib52) 2015 Zhang (2021060823395851900_bib71) 2018 Wang (2021060823395851900_bib67) 2020 Graham (2021060823395851900_bib21) 2015 Ziegler (2021060823395851900_bib79) 2019 Kryściński (2021060823395851900_bib31) 2020 Narayan (2021060823395851900_bib44) 2018 Raffel (2021060823395851900_bib54) 2019 Peyrard (2021060823395851900_bib51) 2017 Cohan (2021060823395851900_bib7) 2016 Lin (2021060823395851900_bib36) 2004 Zhao (2021060823395851900_bib76) 2019 Li (2021060823395851900_bib14) 2019 Lewis (2021060823395851900_bib35) 2019 Sharma (2021060823395851900_bib60) 2019 Bahdanau (2021060823395851900_bib1) 2014 Kryściński (2021060823395851900_bib30) 2019 Hsu (2021060823395851900_bib26) 2018 Vedantam (2021060823395851900_bib65) 2015 Vinyals (2021060823395851900_bib66) 2015 Lavie (2021060823395851900_bib34) 2007 Liu (2021060823395851900_bib38) 2008 Ganesan (2021060823395851900_bib17) 2015 Zhang (2021060823395851900_bib74) 2018 Nallapati (2021060823395851900_bib43) 2016 Devlin (2021060823395851900_bib13) 2019 Stiennon (2021060823395851900_bib61) 2020 Peyrard (2021060823395851900_bib50) 2019 Durmus (2021060823395851900_bib16) 2020 Jiacheng (2021060823395851900_bib70) 2019 Jiang (2021060823395851900_bib27) 2018 Ng (2021060823395851900_bib45) 2015 Kedzie (2021060823395851900_bib28) 2018 Maynez (2021060823395851900_bib41) 2020 Scialom (2021060823395851900_bib57) 2019 Hardy (2021060823395851900_bib24) 2019 Zhang (2021060823395851900_bib73) 2020 Lin (2021060823395851900_bib37) 2004 Guo (2021060823395851900_bib23) 2018 Sandhaus (2021060823395851900_bib56) 2008; 6 Kryściński (2021060823395851900_bib32) 2018 Dong (2021060823395851900_bib15) 2018 Dang (2021060823395851900_bib10) 2009 Owczarzak (2021060823395851900_bib46) 2012 Rankel (2021060823395851900_bib55) 2013 Hermann (2021060823395851900_bib25) 2015 Pasunuru (2021060823395851900_bib48) 2018 Dang (2021060823395851900_bib8) 2005 Vaswani (2021060823395851900_bib64) 2017 Vasilyev (2021060823395851900_bib63) 2020 Zhang (2021060823395851900_bib72) 2019 Liu (2021060823395851900_bib39) 2019 Zhang (2021060823395851900_bib75) 2019 Mikolov (2021060823395851900_bib42) 2013 Chen (2021060823395851900_bib5) 2018 Williams (2021060823395851900_bib68) 1992; 8 Böhm (2021060823395851900_bib2) 2019 Radford (2021060823395851900_bib53) 2019; 1 Grusky (2021060823395851900_bib22) 2018 Gillick (2021060823395851900_bib20) 2010 Paulus (2021060823395851900_bib49) 2017 Kusner (2021060823395851900_bib33) 2015 Gehrmann (2021060823395851900_bib19) 2018 Dernoncourt (2021060823395851900_bib11) 2018 ShafieiBavani (2021060823395851900_bib59) 2018 |
References_xml | – start-page: 708 volume-title: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) year: 2018 ident: 2021060823395851900_bib22 article-title: Newsroom: A dataset of 1.3 million summaries with diverse extractive strategies – year: 2019 ident: 2021060823395851900_bib35 article-title: BART: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension publication-title: arXiv preprint arXiv:1910.13461 – volume-title: NTCIR year: 2004 ident: 2021060823395851900_bib36 article-title: Looking for a few good metrics: Automatic summarization evaluation-how many samples are enough? – start-page: 243 volume-title: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics: Student Research Workshop year: 2019 ident: 2021060823395851900_bib3 article-title: STRASS: A light and effective method for extractive summarization based on sentence embeddings doi: 10.18653/v1/P19-2034 – start-page: 1 volume-title: Proceedings of the document understanding conference year: 2005 ident: 2021060823395851900_bib8 article-title: Overview of DUC 2005 – start-page: 148 volume-title: Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon’s Mechanical Turk year: 2010 ident: 2021060823395851900_bib20 article-title: Non-expert evaluation of summarization systems is risky – start-page: 447 volume-title: Proceedings of the Human Language Technology Conference of the NAACL, Main Conference year: 2006 ident: 2021060823395851900_bib77 article-title: ParaEval: Using paraphrases to evaluate summaries automatically – start-page: 128 volume-title: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing year: 2015 ident: 2021060823395851900_bib21 article-title: Re-evaluating automatic summarization with BLEU and 192 shades of ROUGE doi: 10.18653/v1/D15-1013 – start-page: 74 volume-title: Text Summarization Branches Out year: 2004 ident: 2021060823395851900_bib37 article-title: ROUGE: A package for automatic evaluation of summaries – volume: 1 start-page: 9 issue: 8 year: 2019 ident: 2021060823395851900_bib53 article-title: Language models are unsupervised multitask learners publication-title: OpenAI Blog – start-page: 392 volume-title: Proceedings of the Tenth Workshop on Statistical Machine Translation year: 2015 ident: 2021060823395851900_bib52 article-title: chrF: character n-gram F-score for automatic MT evaluation doi: 10.18653/v1/W15-3049 – start-page: 3110 volume-title: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) year: 2019 ident: 2021060823395851900_bib2 article-title: Better rewards yield better summaries: Learning to summarise without references doi: 10.18653/v1/D19-1307 – start-page: 132 volume-title: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) year: 2018 ident: 2021060823395851900_bib26 article-title: A unified model for extractive and abstractive summarization using inconsistency loss doi: 10.18653/v1/P18-1013 – start-page: 4067 volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing year: 2018 ident: 2021060823395851900_bib27 article-title: Closed-book training to improve summarization encoder memory doi: 10.18653/v1/D18-1440 – start-page: 9332 volume-title: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP) year: 2020 ident: 2021060823395851900_bib31 article-title: Evaluating the factual consistency of abstractive text summarization doi: 10.18653/v1/2020.emnlp-main.750 – start-page: 646 volume-title: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers) year: 2018 ident: 2021060823395851900_bib48 article-title: Multi-reward reinforced summarization with saliency and entailment – start-page: 3111 volume-title: Advances in Neural Information Processing Systems 26 year: 2013 ident: 2021060823395851900_bib42 article-title: Distributed representations of words and phrases and their compositionality – start-page: 1925 volume-title: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing year: 2015 ident: 2021060823395851900_bib45 article-title: Better summarization evaluation with word embeddings for ROUGE doi: 10.18653/v1/D15-1222 – volume: 8 start-page: 229 issue: 3–4 year: 1992 ident: 2021060823395851900_bib68 article-title: Simple statistical gradient-following algorithms for connectionist reinforcement learning publication-title: Machine Learning doi: 10.1007/BF00992696 – year: 2019 ident: 2021060823395851900_bib54 article-title: Exploring the limits of transfer learning with a unified text-to-text transformer publication-title: arXiv e-prints – year: 2020 ident: 2021060823395851900_bib12 article-title: SacreROUGE: An Open-Source Library for Using and Developing Summarization Evaluation Metrics doi: 10.18653/v1/2020.nlposs-1.17 – volume-title: Proceedings of the Text Analysis Conference year: 2009 ident: 2021060823395851900_bib10 article-title: Overview of the TAC 2009 summarization track – start-page: 5093 volume-title: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics year: 2019 ident: 2021060823395851900_bib50 article-title: Studying summarization evaluation metrics in the appropriate scoring range doi: 10.18653/v1/P19-1502 – start-page: 563 volume-title: EMNLP-IJCNLP 2019 year: 2019 ident: 2021060823395851900_bib76 article-title: MoverScore: Text generation evaluating with contextualized embeddings and earth mover distance – start-page: 643 volume-title: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) year: 2018 ident: 2021060823395851900_bib4 article-title: The price of debiasing automatic metrics in natural language evaluation doi: 10.18653/v1/P18-1060 – volume-title: Thirty-Second AAAI Conference on Artificial Intelligence year: 2018 ident: 2021060823395851900_bib69 article-title: Learning to extract coherent summary via deep reinforcement learning – start-page: 4566 volume-title: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition year: 2015 ident: 2021060823395851900_bib65 article-title: CIDEr: Consensus-based image description evaluation – start-page: 4171 volume-title: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers) year: 2019 ident: 2021060823395851900_bib13 article-title: BERT: Pre-training of deep bidirectional transformers for language understanding – volume: 39 start-page: 267 issue: 2 year: 2013 ident: 2021060823395851900_bib40 article-title: Automatically assessing machine summary content without a gold standard publication-title: Computational Linguistics doi: 10.1162/COLI_a_00123 – start-page: 785 volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing year: 2018 ident: 2021060823395851900_bib71 article-title: On the abstractiveness of neural document summarization doi: 10.18653/v1/D18-1089 – start-page: 131 volume-title: Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) year: 2013 ident: 2021060823395851900_bib55 article-title: A decade of automatic content evaluation of news summaries: Reassessing the state of the art – start-page: 2748 volume-title: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics year: 2019 ident: 2021060823395851900_bib6 article-title: Sentence mover’s similarity: Automatic evaluation for multi-sentence texts doi: 10.18653/v1/P19-1264 – volume-title: TAC year: 2008 ident: 2021060823395851900_bib9 article-title: Overview of the TAC 2008 update summarization task. – volume-title: Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018) year: 2018 ident: 2021060823395851900_bib11 article-title: A repository of corpora for summarization – year: 2017 ident: 2021060823395851900_bib49 article-title: A deep reinforced model for abstractive summarization publication-title: arXiv preprint arXiv:1705.04304 – start-page: 1347 volume-title: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5-10, 2020 year: 2020 ident: 2021060823395851900_bib18 article-title: SUPERT: towards new frontiers in unsupervised evaluation metrics for multi-document summarization – start-page: 1808 volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing year: 2018 ident: 2021060823395851900_bib32 article-title: Improving abstraction in text summarization doi: 10.18653/v1/D18-1207 – start-page: 1818 volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing year: 2018 ident: 2021060823395851900_bib28 article-title: Content selection in deep learning models of summarization doi: 10.18653/v1/D18-1208 – year: 2011 ident: 2021060823395851900_bib29 article-title: Computing krippendorff’s alpha-reliability – start-page: 13042 volume-title: Advances in Neural Information Processing Systems year: 2019 ident: 2021060823395851900_bib14 article-title: Unified language model pre-training for natural language understanding and generation – start-page: 4098 volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing year: 2018 ident: 2021060823395851900_bib19 article-title: Bottom-up abstractive summarization doi: 10.18653/v1/D18-1443 – year: 2019 ident: 2021060823395851900_bib79 article-title: Fine-tuning language models from human preferences publication-title: arXiv preprint arXiv:1909.08593 – start-page: 1693 volume-title: Advances in Neural Information Processing Systems year: 2015 ident: 2021060823395851900_bib25 article-title: Teaching machines to read and comprehend – start-page: 3730 volume-title: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) year: 2019 ident: 2021060823395851900_bib39 article-title: Text summarization with pretrained encoders – start-page: 3292 volume-title: EMNLP-IJCNLP 2019 year: 2019 ident: 2021060823395851900_bib70 article-title: Neural extractive text summarization with syntactic compression – start-page: 806 volume-title: Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16) year: 2016 ident: 2021060823395851900_bib7 article-title: Revisiting summarization evaluation for scientific articles – start-page: 1073 volume-title: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) year: 2017 ident: 2021060823395851900_bib58 article-title: Get to the point: Summarization with pointer-generator networks doi: 10.18653/v1/P17-1099 – start-page: 654 volume-title: ACL 2018 year: 2018 ident: 2021060823395851900_bib78 article-title: Neural document summarization by jointly learning to score and select sentences – start-page: 311 volume-title: Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics year: 2002 ident: 2021060823395851900_bib47 article-title: BLEU: A method for automatic evaluation of machine translation – start-page: 5998 volume-title: Advances in Neural Information Processing Systems year: 2017 ident: 2021060823395851900_bib64 article-title: Attention is all you need – year: 2020 ident: 2021060823395851900_bib63 article-title: Fill in the BLANC: human-free quality estimation of document summaries publication-title: CoRR – start-page: 687 volume-title: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) year: 2018 ident: 2021060823395851900_bib23 article-title: Soft layer-specific multi-task summarization with entailment and question generation doi: 10.18653/v1/P18-1064 – start-page: 74 volume-title: Proceedings of the Workshop on New Frontiers in Summarization year: 2017 ident: 2021060823395851900_bib51 article-title: Learning to score system summaries for better content selection evaluation. doi: 10.18653/v1/W17-4510 – start-page: 3381 volume-title: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics year: 2019 ident: 2021060823395851900_bib24 article-title: HighRES: Highlight-based reference-less evaluation of summarization doi: 10.18653/v1/P19-1330 – start-page: 359 volume-title: The 50th Annual Meeting of the Association for Computational Linguistics, Proceedings of the Conference, July 8–14, 2012, Jeju Island, Korea - Volume 2: Short Papers year: 2012 ident: 2021060823395851900_bib46 article-title: Assessing the effect of inconsistent assessors on summarization evaluation – start-page: 228 volume-title: Proceedings of the Second Workshop on Statistical Machine Translation year: 2007 ident: 2021060823395851900_bib34 article-title: METEOR: An automatic metric for MT evaluation with high levels of correlation with human judgments doi: 10.3115/1626355.1626389 – year: 2016 ident: 2021060823395851900_bib43 article-title: Abstractive text summarization using sequence-to-sequence rnns and beyond publication-title: arXiv preprint arXiv: 1602.06023 – year: 2020 ident: 2021060823395851900_bib61 article-title: Learning to summarize from human feedback publication-title: CoRR – start-page: 5059 volume-title: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics year: 2019 ident: 2021060823395851900_bib75 article-title: HIBERT: Document level pre-training of hierarchical bidirectional transformers for document summarization doi: 10.18653/v1/P19-1499 – start-page: 1747 volume-title: Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) year: 2018 ident: 2021060823395851900_bib44 article-title: Ranking sentences for extractive summarization with reinforcement learning – start-page: 540 volume-title: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) year: 2019 ident: 2021060823395851900_bib30 article-title: Neural text summarization: A critical evaluation doi: 10.18653/v1/D19-1051 – start-page: 779 volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing year: 2018 ident: 2021060823395851900_bib74 article-title: Neural latent extractive document summarization doi: 10.18653/v1/D18-1088 – start-page: 3739 volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing year: 2018 ident: 2021060823395851900_bib15 article-title: BanditSum: Extractive summarization as a contextual bandit doi: 10.18653/v1/D18-1409 – start-page: 1906 volume-title: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, July 5–10, 2020 year: 2020 ident: 2021060823395851900_bib41 article-title: On faithfulness and factuality in abstractive summarization – start-page: 5008 volume-title: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics year: 2020 ident: 2021060823395851900_bib67 article-title: Asking and answering questions to evaluate the factual consistency of summaries doi: 10.18653/v1/2020.acl-main.450 – volume-title: International Conference on Learning Representations year: 2020 ident: 2021060823395851900_bib73 article-title: Bertscore: Evaluating text generation with BERT – start-page: 957 volume-title: International Conference on Machine Learning year: 2015 ident: 2021060823395851900_bib33 article-title: From word embeddings to document distances – start-page: 2692 volume-title: Advances in Neural Information Processing Systems year: 2015 ident: 2021060823395851900_bib66 article-title: Pointer networks – year: 2015 ident: 2021060823395851900_bib17 article-title: Rouge 2.0: Updated and improved measures for evaluation of summarization tasks – start-page: 3104 volume-title: Advances in Neural Information processing Systems year: 2014 ident: 2021060823395851900_bib62 article-title: Sequence to sequence learning with neural networks – year: 2014 ident: 2021060823395851900_bib1 article-title: Neural machine translation by jointly learning to align and translate publication-title: arXiv preprint arXiv:1409.0473 – start-page: 201 volume-title: Proceedings of ACL-08: HLT, Short Papers year: 2008 ident: 2021060823395851900_bib38 article-title: Correlation between ROUGE and human evaluation of extractive meeting summaries – volume: 6 start-page: e26752 issue: 12 year: 2008 ident: 2021060823395851900_bib56 article-title: The New York Times annotated corpus publication-title: Linguistic Data Consortium, Philadelphia – start-page: 3246 volume-title: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) year: 2019 ident: 2021060823395851900_bib57 article-title: Answers unite! unsupervised metrics for reinforced summarization models doi: 10.18653/v1/D19-1320 – start-page: 3280 volume-title: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP) year: 2019 ident: 2021060823395851900_bib60 article-title: An entity-driven framework for abstractive summarization doi: 10.18653/v1/D19-1323 – start-page: 5055 volume-title: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics year: 2020 ident: 2021060823395851900_bib16 article-title: FEQA: A question answering evaluation framework for faithfulness assessment in abstractive summarization doi: 10.18653/v1/2020.acl-main.454 – start-page: 762 volume-title: Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing year: 2018 ident: 2021060823395851900_bib59 article-title: A graph-theoretic summary evaluation for ROUGE doi: 10.18653/v1/D18-1085 – year: 2019 ident: 2021060823395851900_bib72 article-title: Pegasus: Pre-training with extracted gap-sentences for abstractive summarization publication-title: arXiv preprint arXiv:1912.08777 – start-page: 675 volume-title: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) year: 2018 ident: 2021060823395851900_bib5 article-title: Fast abstractive summarization with reinforce-selected sentence rewriting doi: 10.18653/v1/P18-1063 |
SSID | ssj0001818062 |
Score | 2.6433 |
Snippet | The scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols continue... AbstractThe scarcity of comprehensive up-to-date studies on evaluation metrics for text summarization and the lack of consensus regarding evaluation protocols... |
SourceID | doaj proquest crossref mit |
SourceType | Open Website Aggregation Database Enrichment Source Index Database Publisher |
StartPage | 391 |
SubjectTerms | Annotations Automatic summarization Crowdsourcing Datasets Linguistics Summaries |
SummonAdditionalLinks | – databaseName: DOAJ Directory of Open Access Journals dbid: DOA link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1LS8QwEA6yJy-iqLi6SgU9Sdk2r6beVHZZBD2oC3srmTxEWFfR-v-dpO26IuLFSw_plCQzTebBzDeEnFhRKs80OjkZFCkvJKSlsZCCoUpqJwzE8rGbWzmZ8uuZmK20-go5YQ08cMO4IdMqFxatYK04z1yuTUY9FT63pdUtjCTqvBVnKkZXQgmzpF2mu6TDWpt5pQMUZsG-6aAI1Y-a5fmp_nEfRyUz3iQbrXWYXDSr2iJrbrFNshCpQot3fp7cubSD5148Jvex8KwtpExGS9zuHTIdjx6uJmnb6CA1eIRqZJEzOd5KmuFWNQgNEADEQUknhQ6thKRlmdU8855xbnXhlCsN5V4hKz2wXdJbvCzcHkmU4T53ZWEpQHgNCp_ee8nxCwlln5x1W69MiwIemlHMq-gNSFqtMqpPTpfUrw36xS90l4GLS5qAWR0HUJJVK8nqL0n2yTHKoGrP0PsvEw06CX0Ros_IlBIlzff_Yx0HZJ2GxJUYZxmQXv324Q7R8qjhKP5kn5_t1kI priority: 102 providerName: Directory of Open Access Journals |
Title | SummEval: Re-evaluating Summarization Evaluation |
URI | https://direct.mit.edu/tacl/article/doi/10.1162/tacl_a_00373 https://www.proquest.com/docview/2893885921 https://doaj.org/article/3a815d182a8440e1ac02f25f1d9da002 |
Volume | 9 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwfV3NS8MwFA-6XbyIouJ0jgp6kmKbpmnqRZxsDsEh6mC3ks8hzG1u9f_3JUs3Realh-aVkpfkfeW930PoQqU5MwkHJycSWUgyKsJcKhEKiRnlOpXClY899WlvQB6H6dAH3BY-rbKSiU5Qq6m0MfJrcAwSxtIcx7ezz9B2jbK3q76Fxjaqgwhm4HzV253-88s6ymJLmV1XUZvwbIFkh1X2O8XXJZfjglt4zCz5pZccfD9om4_38o-Mdoqnu4d2vcUY3C2XeB9t6ckBimz0Cqzg8U3wosMKsnsyCl5dMZovrgw6KyzvQzTodt7ue6FvfhBKOFYlsE3LGCQVT-JUcZFyISyouGBU05Tb9kJUJZHiJDImIUTxTDOdS0wMA_YakRyh2mQ60ccoYJKYWOeZwkLYYcHgaYyhBL6gIm-gq2rqhfTI4LZBxbhwHgLFxU9GNdDlinq2RMTYQNe2XFzRWBxr92I6HxX-WBQJZzA98HE4IyTSMZcRNjg1scoVB2HdQOewBoU_V4sNP2pWK7QmXG-Xk_-HT9EOtmkqLqrSRLVy_qXPwM4oRQtts-5Dy2-plvPWvwHEX9Oe |
linkProvider | ProQuest |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1LT9wwEB5ROLQXCqJVl0JJJThVgcSxHadSD-WlBRaEeKh7C36iqtvdCoIq-lf4K_w4xt5kF6joDamXHOJJ_BrPeEYz3wAsG1YIl0k0chKVxzTnKi60UbHSRHBpmVYhfWz_gLdP6W6XdSfgpsmF8WGVjUwMgtoMtPeRr6FhkAnBCpLWEZR79vo32meXX3Y2cTNXCNneOtlox3UJgVgjc1bYudUpnneZpcxIxaRSHppbCW45k75IDzdZYiRNnMsoNTK3whaaUCdwkE5l-N8XMIVWBcNTP7Vx1Dn8Nnbh-DzpULLUR1N7lNpuE1rPyVolda-UHnszzx4ovVAbAFXZz-_VXwogaLXt13DbrMcwmOXH6lWlVvWfR1CR_-mCzcB0fZuOvg7ZfxYmbH8OEu_ZQwuh9zk6snEDZ94_j45Dol6deBptjXDO38DpswzyLUz2B337DiKhqUttkRuilG9WAp_OOU7xC66KFnxqdq7UNWq6L97RK4P1xEl5f59bsDKi_jVEC3mCbt0zwYjGY3yHF4OL87IWGWUmBU4P7T8pKE1sKnVCHGEuNYWRqMha8BFZqKxlzuUTHS00zDEmHHPG_L-bl-Bl-2S_U3Z2Dvbewyviw3mC92kBJquLK7uI97FKfajPRQRnz81Zd8iaQzw |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=SummEval%3A+Re-evaluating+Summarization+Evaluation&rft.jtitle=Transactions+of+the+Association+for+Computational+Linguistics&rft.au=Fabbri%2C+Alexander+R&rft.au=Kry%C5%9Bci%C5%84ski%2C+Wojciech&rft.au=McCann%2C+Bryan&rft.au=Xiong%2C+Caiming&rft.date=2021-01-01&rft.pub=MIT+Press+Journals%2C+The&rft.issn=2307-387X&rft.eissn=2307-387X&rft.volume=9&rft.spage=391&rft_id=info:doi/10.1162%2Ftacl_a_00373 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2307-387X&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2307-387X&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2307-387X&client=summon |