Towards Understanding Omission in Dialogue Summarization

Dialogue summarization aims to condense the lengthy dialogue into a concise summary, and has recently achieved significant progress. However, the result of existing methods is still far from satisfactory. Previous works indicated that omission is a major factor in affecting the quality of summarizat...

Full description

Saved in:

Bibliographic Details
Main Authors	Zou, Yicheng, Song, Kaitao, Tan, Xu, Fu, Zhongkai, Zhang, Qi, Li, Dongsheng, Gui, Tao
Format	Journal Article
Language	English
Published	14.11.2022
Subjects	Computer Science - Computation and Language
Online Access	Get full text

Cover

Loading…

Abstract	Dialogue summarization aims to condense the lengthy dialogue into a concise summary, and has recently achieved significant progress. However, the result of existing methods is still far from satisfactory. Previous works indicated that omission is a major factor in affecting the quality of summarization, but few of them have further explored the omission problem, such as how omission affects summarization results and how to detect omission, which is critical for reducing omission and improving summarization quality. Moreover, analyzing and detecting omission relies on summarization datasets with omission labels (i.e., which dialogue utterances are omitted in the summarization), which are not available in the current literature. In this paper, we propose the OLDS dataset, which provides high-quality Omission Labels for Dialogue Summarization. By analyzing this dataset, we find that a large improvement in summarization quality can be achieved by providing ground-truth omission labels for the summarization model to recover omission information, which demonstrates the importance of omission detection for omission mitigation in dialogue summarization. Therefore, we formulate an omission detection task and demonstrate our proposed dataset can support the training and evaluation of this task well. We also call for research action on omission detection based on our proposed datasets. Our dataset and codes are publicly available.
AbstractList	Dialogue summarization aims to condense the lengthy dialogue into a concise summary, and has recently achieved significant progress. However, the result of existing methods is still far from satisfactory. Previous works indicated that omission is a major factor in affecting the quality of summarization, but few of them have further explored the omission problem, such as how omission affects summarization results and how to detect omission, which is critical for reducing omission and improving summarization quality. Moreover, analyzing and detecting omission relies on summarization datasets with omission labels (i.e., which dialogue utterances are omitted in the summarization), which are not available in the current literature. In this paper, we propose the OLDS dataset, which provides high-quality Omission Labels for Dialogue Summarization. By analyzing this dataset, we find that a large improvement in summarization quality can be achieved by providing ground-truth omission labels for the summarization model to recover omission information, which demonstrates the importance of omission detection for omission mitigation in dialogue summarization. Therefore, we formulate an omission detection task and demonstrate our proposed dataset can support the training and evaluation of this task well. We also call for research action on omission detection based on our proposed datasets. Our dataset and codes are publicly available.
Author	Song, Kaitao Li, Dongsheng Zhang, Qi Tan, Xu Gui, Tao Zou, Yicheng Fu, Zhongkai
Author_xml	– sequence: 1 givenname: Yicheng surname: Zou fullname: Zou, Yicheng – sequence: 2 givenname: Kaitao surname: Song fullname: Song, Kaitao – sequence: 3 givenname: Xu surname: Tan fullname: Tan, Xu – sequence: 4 givenname: Zhongkai surname: Fu fullname: Fu, Zhongkai – sequence: 5 givenname: Qi surname: Zhang fullname: Zhang, Qi – sequence: 6 givenname: Dongsheng surname: Li fullname: Li, Dongsheng – sequence: 7 givenname: Tao surname: Gui fullname: Gui, Tao
BackLink	https://doi.org/10.48550/arXiv.2211.07145$$DView paper in arXiv
BookMark	eNotj71uwjAURj2UoYU-QKf6BRJsJzeOx4qWFgmJoekcXSe-yBJxKhv6w9NDKdM3HOnTOXfsJozBMfYgRV7WAGKO8cd_5UpJmQstS7hldTN-Y-wT_wi9i2mPofdhyzeDT8mPgfvAnz3uxu3B8ffDMGD0R9yfyYxNCHfJ3V93yprlS7N4y9ab19XiaZ1hpSGDSnUgyAqkSlmDWlJlFNVSgCVD2hpbdlKVQmhLCsgZRCo0auyAlK6LKXv8v72ot5_RnxV-27-E9pJQnACLGkNx
ContentType	Journal Article
Copyright	http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml	– notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID	AKY GOX
DOI	10.48550/arxiv.2211.07145
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2211_07145
GroupedDBID	AKY GOX
ID	FETCH-LOGICAL-a675-562c50fb0af62b9a71f692f8105bf9f7b9b4c124007bf25fe9aaf37a7ac5f2783
IEDL.DBID	GOX
IngestDate	Mon Jan 08 05:45:19 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a675-562c50fb0af62b9a71f692f8105bf9f7b9b4c124007bf25fe9aaf37a7ac5f2783
OpenAccessLink	https://arxiv.org/abs/2211.07145
ParticipantIDs	arxiv_primary_2211_07145
PublicationCentury	2000
PublicationDate	2022-11-14
PublicationDateYYYYMMDD	2022-11-14
PublicationDate_xml	– month: 11 year: 2022 text: 2022-11-14 day: 14
PublicationDecade	2020
PublicationYear	2022
Score	1.8621583
SecondaryResourceType	preprint
Snippet	Dialogue summarization aims to condense the lengthy dialogue into a concise summary, and has recently achieved significant progress. However, the result of...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Computation and Language
Title	Towards Understanding Omission in Dialogue Summarization
URI	https://arxiv.org/abs/2211.07145
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV09T8MwED21nVgQCFD5lAdWi-Ti2M5YAaVioAOplC2yXZ-UpUJtQfx8bCeILiwe7Ft8Hu7d-b07gPsQQx0677jwrgiLklxrk_OctBaFyEL-lgiyb3KxEq9N2YyA_WphzPa7--r7A9vdA2LssKlyUY5hjBgpWy_Lpv-cTK24Bvs_u4Ax09ZBkJifwPGA7tisf45TGPnNGeg6UVN3bHUoJGHL4OFYqmLdhj11fQ2FvSct2aCNPId6_lw_LvgwsICbgLt5gBKuzMhmhiTayqicZIWkA4SxVJGylRUuj6RNZQkjy8sYKpRRxpUUJ15cwCTk_H4KLOSNGpXXrnReVIRWarsmg9YUktaElzBN12w_-p4UbfRAmzxw9f_RNRxhZO9HFpu4gcl---lvQ0zd27vk2B-023ek
link.rule.ids	228,230,783,888
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Towards+Understanding+Omission+in+Dialogue+Summarization&rft.au=Zou%2C+Yicheng&rft.au=Song%2C+Kaitao&rft.au=Tan%2C+Xu&rft.au=Fu%2C+Zhongkai&rft.date=2022-11-14&rft_id=info:doi/10.48550%2Farxiv.2211.07145&rft.externalDocID=2211_07145