Towards Understanding Omission in Dialogue Summarization
Dialogue summarization aims to condense the lengthy dialogue into a concise summary, and has recently achieved significant progress. However, the result of existing methods is still far from satisfactory. Previous works indicated that omission is a major factor in affecting the quality of summarizat...
Saved in:
Main Authors | , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
14.11.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Dialogue summarization aims to condense the lengthy dialogue into a concise
summary, and has recently achieved significant progress. However, the result of
existing methods is still far from satisfactory. Previous works indicated that
omission is a major factor in affecting the quality of summarization, but few
of them have further explored the omission problem, such as how omission
affects summarization results and how to detect omission, which is critical for
reducing omission and improving summarization quality. Moreover, analyzing and
detecting omission relies on summarization datasets with omission labels (i.e.,
which dialogue utterances are omitted in the summarization), which are not
available in the current literature. In this paper, we propose the OLDS
dataset, which provides high-quality Omission Labels for Dialogue
Summarization. By analyzing this dataset, we find that a large improvement in
summarization quality can be achieved by providing ground-truth omission labels
for the summarization model to recover omission information, which demonstrates
the importance of omission detection for omission mitigation in dialogue
summarization. Therefore, we formulate an omission detection task and
demonstrate our proposed dataset can support the training and evaluation of
this task well. We also call for research action on omission detection based on
our proposed datasets. Our dataset and codes are publicly available. |
---|---|
AbstractList | Dialogue summarization aims to condense the lengthy dialogue into a concise
summary, and has recently achieved significant progress. However, the result of
existing methods is still far from satisfactory. Previous works indicated that
omission is a major factor in affecting the quality of summarization, but few
of them have further explored the omission problem, such as how omission
affects summarization results and how to detect omission, which is critical for
reducing omission and improving summarization quality. Moreover, analyzing and
detecting omission relies on summarization datasets with omission labels (i.e.,
which dialogue utterances are omitted in the summarization), which are not
available in the current literature. In this paper, we propose the OLDS
dataset, which provides high-quality Omission Labels for Dialogue
Summarization. By analyzing this dataset, we find that a large improvement in
summarization quality can be achieved by providing ground-truth omission labels
for the summarization model to recover omission information, which demonstrates
the importance of omission detection for omission mitigation in dialogue
summarization. Therefore, we formulate an omission detection task and
demonstrate our proposed dataset can support the training and evaluation of
this task well. We also call for research action on omission detection based on
our proposed datasets. Our dataset and codes are publicly available. |
Author | Song, Kaitao Li, Dongsheng Zhang, Qi Tan, Xu Gui, Tao Zou, Yicheng Fu, Zhongkai |
Author_xml | – sequence: 1 givenname: Yicheng surname: Zou fullname: Zou, Yicheng – sequence: 2 givenname: Kaitao surname: Song fullname: Song, Kaitao – sequence: 3 givenname: Xu surname: Tan fullname: Tan, Xu – sequence: 4 givenname: Zhongkai surname: Fu fullname: Fu, Zhongkai – sequence: 5 givenname: Qi surname: Zhang fullname: Zhang, Qi – sequence: 6 givenname: Dongsheng surname: Li fullname: Li, Dongsheng – sequence: 7 givenname: Tao surname: Gui fullname: Gui, Tao |
BackLink | https://doi.org/10.48550/arXiv.2211.07145$$DView paper in arXiv |
BookMark | eNotj71uwjAURj2UoYU-QKf6BRJsJzeOx4qWFgmJoekcXSe-yBJxKhv6w9NDKdM3HOnTOXfsJozBMfYgRV7WAGKO8cd_5UpJmQstS7hldTN-Y-wT_wi9i2mPofdhyzeDT8mPgfvAnz3uxu3B8ffDMGD0R9yfyYxNCHfJ3V93yprlS7N4y9ab19XiaZ1hpSGDSnUgyAqkSlmDWlJlFNVSgCVD2hpbdlKVQmhLCsgZRCo0auyAlK6LKXv8v72ot5_RnxV-27-E9pJQnACLGkNx |
ContentType | Journal Article |
Copyright | http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | AKY GOX |
DOI | 10.48550/arxiv.2211.07145 |
DatabaseName | arXiv Computer Science arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 2211_07145 |
GroupedDBID | AKY GOX |
ID | FETCH-LOGICAL-a675-562c50fb0af62b9a71f692f8105bf9f7b9b4c124007bf25fe9aaf37a7ac5f2783 |
IEDL.DBID | GOX |
IngestDate | Mon Jan 08 05:45:19 EST 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-a675-562c50fb0af62b9a71f692f8105bf9f7b9b4c124007bf25fe9aaf37a7ac5f2783 |
OpenAccessLink | https://arxiv.org/abs/2211.07145 |
ParticipantIDs | arxiv_primary_2211_07145 |
PublicationCentury | 2000 |
PublicationDate | 2022-11-14 |
PublicationDateYYYYMMDD | 2022-11-14 |
PublicationDate_xml | – month: 11 year: 2022 text: 2022-11-14 day: 14 |
PublicationDecade | 2020 |
PublicationYear | 2022 |
Score | 1.8621583 |
SecondaryResourceType | preprint |
Snippet | Dialogue summarization aims to condense the lengthy dialogue into a concise
summary, and has recently achieved significant progress. However, the result of... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Computation and Language |
Title | Towards Understanding Omission in Dialogue Summarization |
URI | https://arxiv.org/abs/2211.07145 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV09T8MwED21nVgQCFD5lAdWi-Ti2M5YAaVioAOplC2yXZ-UpUJtQfx8bCeILiwe7Ft8Hu7d-b07gPsQQx0677jwrgiLklxrk_OctBaFyEL-lgiyb3KxEq9N2YyA_WphzPa7--r7A9vdA2LssKlyUY5hjBgpWy_Lpv-cTK24Bvs_u4Ax09ZBkJifwPGA7tisf45TGPnNGeg6UVN3bHUoJGHL4OFYqmLdhj11fQ2FvSct2aCNPId6_lw_LvgwsICbgLt5gBKuzMhmhiTayqicZIWkA4SxVJGylRUuj6RNZQkjy8sYKpRRxpUUJ15cwCTk_H4KLOSNGpXXrnReVIRWarsmg9YUktaElzBN12w_-p4UbfRAmzxw9f_RNRxhZO9HFpu4gcl---lvQ0zd27vk2B-023ek |
link.rule.ids | 228,230,783,888 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Towards+Understanding+Omission+in+Dialogue+Summarization&rft.au=Zou%2C+Yicheng&rft.au=Song%2C+Kaitao&rft.au=Tan%2C+Xu&rft.au=Fu%2C+Zhongkai&rft.date=2022-11-14&rft_id=info:doi/10.48550%2Farxiv.2211.07145&rft.externalDocID=2211_07145 |