Towards Understanding Omission in Dialogue Summarization

Dialogue summarization aims to condense the lengthy dialogue into a concise summary, and has recently achieved significant progress. However, the result of existing methods is still far from satisfactory. Previous works indicated that omission is a major factor in affecting the quality of summarizat...

Full description

Saved in:
Bibliographic Details
Main Authors Zou, Yicheng, Song, Kaitao, Tan, Xu, Fu, Zhongkai, Zhang, Qi, Li, Dongsheng, Gui, Tao
Format Journal Article
LanguageEnglish
Published 14.11.2022
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Dialogue summarization aims to condense the lengthy dialogue into a concise summary, and has recently achieved significant progress. However, the result of existing methods is still far from satisfactory. Previous works indicated that omission is a major factor in affecting the quality of summarization, but few of them have further explored the omission problem, such as how omission affects summarization results and how to detect omission, which is critical for reducing omission and improving summarization quality. Moreover, analyzing and detecting omission relies on summarization datasets with omission labels (i.e., which dialogue utterances are omitted in the summarization), which are not available in the current literature. In this paper, we propose the OLDS dataset, which provides high-quality Omission Labels for Dialogue Summarization. By analyzing this dataset, we find that a large improvement in summarization quality can be achieved by providing ground-truth omission labels for the summarization model to recover omission information, which demonstrates the importance of omission detection for omission mitigation in dialogue summarization. Therefore, we formulate an omission detection task and demonstrate our proposed dataset can support the training and evaluation of this task well. We also call for research action on omission detection based on our proposed datasets. Our dataset and codes are publicly available.
AbstractList Dialogue summarization aims to condense the lengthy dialogue into a concise summary, and has recently achieved significant progress. However, the result of existing methods is still far from satisfactory. Previous works indicated that omission is a major factor in affecting the quality of summarization, but few of them have further explored the omission problem, such as how omission affects summarization results and how to detect omission, which is critical for reducing omission and improving summarization quality. Moreover, analyzing and detecting omission relies on summarization datasets with omission labels (i.e., which dialogue utterances are omitted in the summarization), which are not available in the current literature. In this paper, we propose the OLDS dataset, which provides high-quality Omission Labels for Dialogue Summarization. By analyzing this dataset, we find that a large improvement in summarization quality can be achieved by providing ground-truth omission labels for the summarization model to recover omission information, which demonstrates the importance of omission detection for omission mitigation in dialogue summarization. Therefore, we formulate an omission detection task and demonstrate our proposed dataset can support the training and evaluation of this task well. We also call for research action on omission detection based on our proposed datasets. Our dataset and codes are publicly available.
Author Song, Kaitao
Li, Dongsheng
Zhang, Qi
Tan, Xu
Gui, Tao
Zou, Yicheng
Fu, Zhongkai
Author_xml – sequence: 1
  givenname: Yicheng
  surname: Zou
  fullname: Zou, Yicheng
– sequence: 2
  givenname: Kaitao
  surname: Song
  fullname: Song, Kaitao
– sequence: 3
  givenname: Xu
  surname: Tan
  fullname: Tan, Xu
– sequence: 4
  givenname: Zhongkai
  surname: Fu
  fullname: Fu, Zhongkai
– sequence: 5
  givenname: Qi
  surname: Zhang
  fullname: Zhang, Qi
– sequence: 6
  givenname: Dongsheng
  surname: Li
  fullname: Li, Dongsheng
– sequence: 7
  givenname: Tao
  surname: Gui
  fullname: Gui, Tao
BackLink https://doi.org/10.48550/arXiv.2211.07145$$DView paper in arXiv
BookMark eNotj71uwjAURj2UoYU-QKf6BRJsJzeOx4qWFgmJoekcXSe-yBJxKhv6w9NDKdM3HOnTOXfsJozBMfYgRV7WAGKO8cd_5UpJmQstS7hldTN-Y-wT_wi9i2mPofdhyzeDT8mPgfvAnz3uxu3B8ffDMGD0R9yfyYxNCHfJ3V93yprlS7N4y9ab19XiaZ1hpSGDSnUgyAqkSlmDWlJlFNVSgCVD2hpbdlKVQmhLCsgZRCo0auyAlK6LKXv8v72ot5_RnxV-27-E9pJQnACLGkNx
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
GOX
DOI 10.48550/arxiv.2211.07145
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2211_07145
GroupedDBID AKY
GOX
ID FETCH-LOGICAL-a675-562c50fb0af62b9a71f692f8105bf9f7b9b4c124007bf25fe9aaf37a7ac5f2783
IEDL.DBID GOX
IngestDate Mon Jan 08 05:45:19 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a675-562c50fb0af62b9a71f692f8105bf9f7b9b4c124007bf25fe9aaf37a7ac5f2783
OpenAccessLink https://arxiv.org/abs/2211.07145
ParticipantIDs arxiv_primary_2211_07145
PublicationCentury 2000
PublicationDate 2022-11-14
PublicationDateYYYYMMDD 2022-11-14
PublicationDate_xml – month: 11
  year: 2022
  text: 2022-11-14
  day: 14
PublicationDecade 2020
PublicationYear 2022
Score 1.8621583
SecondaryResourceType preprint
Snippet Dialogue summarization aims to condense the lengthy dialogue into a concise summary, and has recently achieved significant progress. However, the result of...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Computation and Language
Title Towards Understanding Omission in Dialogue Summarization
URI https://arxiv.org/abs/2211.07145
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV09T8MwED21nVgQCFD5lAdWi-Ti2M5YAaVioAOplC2yXZ-UpUJtQfx8bCeILiwe7Ft8Hu7d-b07gPsQQx0677jwrgiLklxrk_OctBaFyEL-lgiyb3KxEq9N2YyA_WphzPa7--r7A9vdA2LssKlyUY5hjBgpWy_Lpv-cTK24Bvs_u4Ax09ZBkJifwPGA7tisf45TGPnNGeg6UVN3bHUoJGHL4OFYqmLdhj11fQ2FvSct2aCNPId6_lw_LvgwsICbgLt5gBKuzMhmhiTayqicZIWkA4SxVJGylRUuj6RNZQkjy8sYKpRRxpUUJ15cwCTk_H4KLOSNGpXXrnReVIRWarsmg9YUktaElzBN12w_-p4UbfRAmzxw9f_RNRxhZO9HFpu4gcl---lvQ0zd27vk2B-023ek
link.rule.ids 228,230,783,888
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Towards+Understanding+Omission+in+Dialogue+Summarization&rft.au=Zou%2C+Yicheng&rft.au=Song%2C+Kaitao&rft.au=Tan%2C+Xu&rft.au=Fu%2C+Zhongkai&rft.date=2022-11-14&rft_id=info:doi/10.48550%2Farxiv.2211.07145&rft.externalDocID=2211_07145