EnDex: Evaluation of Dialogue Engagingness at Scale

Findings of EMNLP 2022 We propose EnDex, the first human-reaction based model to evaluate dialogue engagingness. EnDex is trained on 80k Reddit-based Engagement Dataset (RED) curated using a novel distant-supervision framework. Engagingness is a key measure that captures high-level quality of AI dia...

Full description

Saved in:
Bibliographic Details
Main Authors Xu, Guangxuan, Liu, Ruibo, Harel-Canada, Fabrice, Chandra, Nischal Reddy, Peng, Nanyun
Format Journal Article
LanguageEnglish
Published 22.10.2022
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Findings of EMNLP 2022 We propose EnDex, the first human-reaction based model to evaluate dialogue engagingness. EnDex is trained on 80k Reddit-based Engagement Dataset (RED) curated using a novel distant-supervision framework. Engagingness is a key measure that captures high-level quality of AI dialogue systems and closely reflects actual user experience. However, data shortage, plus the abstract and extensive definition of engagingness makes it challenging to develop an automatic metric. Our work departs from mainstream approaches that use synthetic negative examples to train binary classifiers, and instead, proposes a solution using distant-supervision from human-reaction feedback. To support the soundness of our EnDex metric, we offer a theoretical foundation for engagement, an extensive ablation study, and empirical evidence of high correlation on five engagingness related datasets. We will release code, off-the-shelf EnDex model, and a large-scale dataset upon paper publication to facilitate future research.
AbstractList Findings of EMNLP 2022 We propose EnDex, the first human-reaction based model to evaluate dialogue engagingness. EnDex is trained on 80k Reddit-based Engagement Dataset (RED) curated using a novel distant-supervision framework. Engagingness is a key measure that captures high-level quality of AI dialogue systems and closely reflects actual user experience. However, data shortage, plus the abstract and extensive definition of engagingness makes it challenging to develop an automatic metric. Our work departs from mainstream approaches that use synthetic negative examples to train binary classifiers, and instead, proposes a solution using distant-supervision from human-reaction feedback. To support the soundness of our EnDex metric, we offer a theoretical foundation for engagement, an extensive ablation study, and empirical evidence of high correlation on five engagingness related datasets. We will release code, off-the-shelf EnDex model, and a large-scale dataset upon paper publication to facilitate future research.
Author Chandra, Nischal Reddy
Xu, Guangxuan
Peng, Nanyun
Harel-Canada, Fabrice
Liu, Ruibo
Author_xml – sequence: 1
  givenname: Guangxuan
  surname: Xu
  fullname: Xu, Guangxuan
– sequence: 2
  givenname: Ruibo
  surname: Liu
  fullname: Liu, Ruibo
– sequence: 3
  givenname: Fabrice
  surname: Harel-Canada
  fullname: Harel-Canada, Fabrice
– sequence: 4
  givenname: Nischal Reddy
  surname: Chandra
  fullname: Chandra, Nischal Reddy
– sequence: 5
  givenname: Nanyun
  surname: Peng
  fullname: Peng, Nanyun
BackLink https://doi.org/10.48550/arXiv.2210.12362$$DView paper in arXiv
BookMark eNotzrFuwjAUhWEPZSjQB-iEXyAQ23HssCFIoRJSh7JH9zrXkaXUQQkgeHtaynSkfzj6xuwldpEYexfpPLNapwvor-Eyl_I3CKly-cpUGTd0XfLyAu0ZTqGLvPN8E6DtmjPxMjbQhNhEGgYOJ_7toKUpG3loB3p77oQdPsrDepfsv7af69U-gdzIxKBHMIggnEZJxqKVmmpMFTjymdJY14AFUW28M7kglxfWOpFlWEghhJqw2f_tQ10d-_AD_a3601cPvboDkItBUw
ContentType Journal Article
Copyright http://creativecommons.org/licenses/by/4.0
Copyright_xml – notice: http://creativecommons.org/licenses/by/4.0
DBID AKY
GOX
DOI 10.48550/arxiv.2210.12362
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2210_12362
GroupedDBID AKY
GOX
ID FETCH-LOGICAL-a672-7bfba7bba1c5b2e78b825edb03acef435bddab9eed7fc761ec6988c144b921113
IEDL.DBID GOX
IngestDate Mon Jan 08 05:45:15 EST 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-a672-7bfba7bba1c5b2e78b825edb03acef435bddab9eed7fc761ec6988c144b921113
OpenAccessLink https://arxiv.org/abs/2210.12362
ParticipantIDs arxiv_primary_2210_12362
PublicationCentury 2000
PublicationDate 2022-10-22
PublicationDateYYYYMMDD 2022-10-22
PublicationDate_xml – month: 10
  year: 2022
  text: 2022-10-22
  day: 22
PublicationDecade 2020
PublicationYear 2022
Score 1.8647913
SecondaryResourceType preprint
Snippet Findings of EMNLP 2022 We propose EnDex, the first human-reaction based model to evaluate dialogue engagingness. EnDex is trained on 80k Reddit-based...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Computation and Language
Title EnDex: Evaluation of Dialogue Engagingness at Scale
URI https://arxiv.org/abs/2210.12362
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV07T8MwELZKJxYEAlSe8sAaEduJ7bAhmlIxwECRskU--4y6FFRC1Z-P7YTHwmp7ubN8353v7jtCriQzrhRSxvYzkYWXWGamQJ9ZoyUwZiNExWqLRzl_KR6ashkR-t0LY9bb5abnB4aPa84T_0Eysjucx5Kt-6emT04mKq7h_O-54GOmpT8gMdsne4N3R2_76zggI1wdElGvpri9ofUPsTZ983S67L9NaL16TZOCos2hpqPPQWt4RBazenE3z4ZZBZmRKvio4MEoAMNsCRyVhhB5oYNcGIs-uCTgnIEqAJLyVkmGVlZa2xDNQMXjtPdjMg7hPk4I9YUrVM68zh0UMYmGqEyuMQAHuEqIEzJJErbvPR1FG4Vvk_Cn_2-dkV0eC_eD1eX8nIy79SdeBDjt4DLp9AuNx3RZ
link.rule.ids 228,230,786,891
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=EnDex%3A+Evaluation+of+Dialogue+Engagingness+at+Scale&rft.au=Xu%2C+Guangxuan&rft.au=Liu%2C+Ruibo&rft.au=Harel-Canada%2C+Fabrice&rft.au=Chandra%2C+Nischal+Reddy&rft.date=2022-10-22&rft_id=info:doi/10.48550%2Farxiv.2210.12362&rft.externalDocID=2210_12362