EnDex: Evaluation of Dialogue Engagingness at Scale

Findings of EMNLP 2022 We propose EnDex, the first human-reaction based model to evaluate dialogue engagingness. EnDex is trained on 80k Reddit-based Engagement Dataset (RED) curated using a novel distant-supervision framework. Engagingness is a key measure that captures high-level quality of AI dia...

Full description

Saved in:

Bibliographic Details
Main Authors	Xu, Guangxuan, Liu, Ruibo, Harel-Canada, Fabrice, Chandra, Nischal Reddy, Peng, Nanyun
Format	Journal Article
Language	English
Published	22.10.2022
Subjects	Computer Science - Computation and Language
Online Access	Get full text

Cover

Loading…

Abstract	Findings of EMNLP 2022 We propose EnDex, the first human-reaction based model to evaluate dialogue engagingness. EnDex is trained on 80k Reddit-based Engagement Dataset (RED) curated using a novel distant-supervision framework. Engagingness is a key measure that captures high-level quality of AI dialogue systems and closely reflects actual user experience. However, data shortage, plus the abstract and extensive definition of engagingness makes it challenging to develop an automatic metric. Our work departs from mainstream approaches that use synthetic negative examples to train binary classifiers, and instead, proposes a solution using distant-supervision from human-reaction feedback. To support the soundness of our EnDex metric, we offer a theoretical foundation for engagement, an extensive ablation study, and empirical evidence of high correlation on five engagingness related datasets. We will release code, off-the-shelf EnDex model, and a large-scale dataset upon paper publication to facilitate future research.
AbstractList	Findings of EMNLP 2022 We propose EnDex, the first human-reaction based model to evaluate dialogue engagingness. EnDex is trained on 80k Reddit-based Engagement Dataset (RED) curated using a novel distant-supervision framework. Engagingness is a key measure that captures high-level quality of AI dialogue systems and closely reflects actual user experience. However, data shortage, plus the abstract and extensive definition of engagingness makes it challenging to develop an automatic metric. Our work departs from mainstream approaches that use synthetic negative examples to train binary classifiers, and instead, proposes a solution using distant-supervision from human-reaction feedback. To support the soundness of our EnDex metric, we offer a theoretical foundation for engagement, an extensive ablation study, and empirical evidence of high correlation on five engagingness related datasets. We will release code, off-the-shelf EnDex model, and a large-scale dataset upon paper publication to facilitate future research.
Author	Chandra, Nischal Reddy Xu, Guangxuan Peng, Nanyun Harel-Canada, Fabrice Liu, Ruibo
Author_xml	– sequence: 1 givenname: Guangxuan surname: Xu fullname: Xu, Guangxuan – sequence: 2 givenname: Ruibo surname: Liu fullname: Liu, Ruibo – sequence: 3 givenname: Fabrice surname: Harel-Canada fullname: Harel-Canada, Fabrice – sequence: 4 givenname: Nischal Reddy surname: Chandra fullname: Chandra, Nischal Reddy – sequence: 5 givenname: Nanyun surname: Peng fullname: Peng, Nanyun
BackLink	https://doi.org/10.48550/arXiv.2210.12362$$DView paper in arXiv
BookMark	eNotzrFuwjAUhWEPZSjQB-iEXyAQ23HssCFIoRJSh7JH9zrXkaXUQQkgeHtaynSkfzj6xuwldpEYexfpPLNapwvor-Eyl_I3CKly-cpUGTd0XfLyAu0ZTqGLvPN8E6DtmjPxMjbQhNhEGgYOJ_7toKUpG3loB3p77oQdPsrDepfsv7af69U-gdzIxKBHMIggnEZJxqKVmmpMFTjymdJY14AFUW28M7kglxfWOpFlWEghhJqw2f_tQ10d-_AD_a3601cPvboDkItBUw
ContentType	Journal Article
Copyright	http://creativecommons.org/licenses/by/4.0
Copyright_xml	– notice: http://creativecommons.org/licenses/by/4.0
DBID	AKY GOX
DOI	10.48550/arxiv.2210.12362
DatabaseName	arXiv Computer Science arXiv.org
DatabaseTitleList
Database_xml	– sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
ExternalDocumentID	2210_12362
GroupedDBID	AKY GOX
ID	FETCH-LOGICAL-a672-7bfba7bba1c5b2e78b825edb03acef435bddab9eed7fc761ec6988c144b921113
IEDL.DBID	GOX
IngestDate	Mon Jan 08 05:45:15 EST 2024
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-a672-7bfba7bba1c5b2e78b825edb03acef435bddab9eed7fc761ec6988c144b921113
OpenAccessLink	https://arxiv.org/abs/2210.12362
ParticipantIDs	arxiv_primary_2210_12362
PublicationCentury	2000
PublicationDate	2022-10-22
PublicationDateYYYYMMDD	2022-10-22
PublicationDate_xml	– month: 10 year: 2022 text: 2022-10-22 day: 22
PublicationDecade	2020
PublicationYear	2022
Score	1.8647913
SecondaryResourceType	preprint
Snippet	Findings of EMNLP 2022 We propose EnDex, the first human-reaction based model to evaluate dialogue engagingness. EnDex is trained on 80k Reddit-based...
SourceID	arxiv
SourceType	Open Access Repository
SubjectTerms	Computer Science - Computation and Language
Title	EnDex: Evaluation of Dialogue Engagingness at Scale
URI	https://arxiv.org/abs/2210.12362
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV07T8MwELZKJxYEAlSe8sAaEduJ7bAhmlIxwECRskU--4y6FFRC1Z-P7YTHwmp7ubN8353v7jtCriQzrhRSxvYzkYWXWGamQJ9ZoyUwZiNExWqLRzl_KR6ashkR-t0LY9bb5abnB4aPa84T_0Eysjucx5Kt-6emT04mKq7h_O-54GOmpT8gMdsne4N3R2_76zggI1wdElGvpri9ofUPsTZ983S67L9NaL16TZOCos2hpqPPQWt4RBazenE3z4ZZBZmRKvio4MEoAMNsCRyVhhB5oYNcGIs-uCTgnIEqAJLyVkmGVlZa2xDNQMXjtPdjMg7hPk4I9YUrVM68zh0UMYmGqEyuMQAHuEqIEzJJErbvPR1FG4Vvk_Cn_2-dkV0eC_eD1eX8nIy79SdeBDjt4DLp9AuNx3RZ
link.rule.ids	228,230,786,891
linkProvider	Cornell University
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=EnDex%3A+Evaluation+of+Dialogue+Engagingness+at+Scale&rft.au=Xu%2C+Guangxuan&rft.au=Liu%2C+Ruibo&rft.au=Harel-Canada%2C+Fabrice&rft.au=Chandra%2C+Nischal+Reddy&rft.date=2022-10-22&rft_id=info:doi/10.48550%2Farxiv.2210.12362&rft.externalDocID=2210_12362