Engagement Recognition from Listener’s Behaviors in Spoken Dialogue Using a Latent Character Model

This article addresses the estimation of engagement level based on the listener’s behaviors such as backchannel, laughing, head nodding, and eye-gaze. Engagement is defined as the level of how much a user is being interested in and willing to continue the current interaction. When the engagement lev...

Full description

Saved in:

Bibliographic Details
Published in	Transactions of the Japanese Society for Artificial Intelligence Vol. 33; no. 1; pp. DSH-F_1 - 12
Main Authors	Inoue, Koji, Divesh, Lala, Yoshii, Kazuyoshi, Takanashi, Katsuya, Kawahara, Tatsuya
Format	Journal Article
Language	Japanese
Published	The Japanese Society for Artificial Intelligence 01.01.2018
Subjects	behavior character dialogue engagement latent model
Online Access	Get full text

Cover

Loading…

Abstract	This article addresses the estimation of engagement level based on the listener’s behaviors such as backchannel, laughing, head nodding, and eye-gaze. Engagement is defined as the level of how much a user is being interested in and willing to continue the current interaction. When the engagement level is evaluated by multiple annotators, the criteria for annotating the engagement level would depend on each annotator. We assume that each annotator has its own character which affects the way of perceiving the engagement level. We propose a latent character model which estimates the engagement level and also the character of each annotator as a latent variable. The experimental results show that the latent character model can predict the engagement label of each annotator in higher accuracy than other models which do not take the character into account.
AbstractList	This article addresses the estimation of engagement level based on the listener’s behaviors such as backchannel, laughing, head nodding, and eye-gaze. Engagement is defined as the level of how much a user is being interested in and willing to continue the current interaction. When the engagement level is evaluated by multiple annotators, the criteria for annotating the engagement level would depend on each annotator. We assume that each annotator has its own character which affects the way of perceiving the engagement level. We propose a latent character model which estimates the engagement level and also the character of each annotator as a latent variable. The experimental results show that the latent character model can predict the engagement label of each annotator in higher accuracy than other models which do not take the character into account.
Author	Yoshii, Kazuyoshi Kawahara, Tatsuya Divesh, Lala Inoue, Koji Takanashi, Katsuya
Author_xml	– sequence: 1 fullname: Inoue, Koji organization: Graduate School of Informatics, Kyoto University – sequence: 2 fullname: Divesh, Lala organization: Graduate School of Informatics, Kyoto University – sequence: 3 fullname: Yoshii, Kazuyoshi organization: Graduate School of Informatics, Kyoto University – sequence: 4 fullname: Takanashi, Katsuya organization: Graduate School of Informatics, Kyoto University – sequence: 5 fullname: Kawahara, Tatsuya organization: Graduate School of Informatics, Kyoto University
BookMark	eNo9UM1KAzEYDKJgrT35AnmBrfnZZHePtT9WWBGsPYc0_Xabuk1KEgVvvoav55PYavEyMzAwzMwVOnfeAUI3lAypYMVt2kZth5PFPJudoR7lucxKwsn5SZOC5pdoEKNdEUIZzykRPbSeula3sAOX8DMY3zqbrHe4CX6HaxsTOAjfn18R38FGv1sfIrYOL_b-FRyeWN359g3wMlrXYo1rnY5J440O2iQI-NGvobtGF43uIgxO3EfL2fRlPM_qp_uH8ajOtpSVKaMVXfNS5BKAkpxpARVUsioZUC4KU0rCjBamXBXAKDdGNkwS0aykZIXh0PA-Gv3lbmM6jFL7YHc6fCgdkjUdqN-HFOeKHuHwlJr9e-ZQWYHjP2eiZ60
ContentType	Journal Article
Copyright	The Japanese Society for Artificial Intelligence 2018
Copyright_xml	– notice: The Japanese Society for Artificial Intelligence 2018
DOI	10.1527/tjsai.DSH-F
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1346-8030
EndPage	12
ExternalDocumentID	article_tjsai_33_1_33_DSH_F_article_char_en
GroupedDBID	123 2WC ACGFS ALMA_UNASSIGNED_HOLDINGS CS3 E3Z EBS EJD JSF KQ8 OK1 PQEST PQQKQ RJT XSB
ID	FETCH-LOGICAL-j128t-191d38546ee1042a5e9e96982e1357c8602ca5c8b7e213cc6f2605fb6627c3ef3
ISSN	1346-0714
IngestDate	Wed Apr 05 12:55:50 EDT 2023
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	true
Issue	1
Language	Japanese
LinkModel	OpenURL
MergedId	FETCHMERGED-LOGICAL-j128t-191d38546ee1042a5e9e96982e1357c8602ca5c8b7e213cc6f2605fb6627c3ef3
OpenAccessLink	https://www.jstage.jst.go.jp/article/tjsai/33/1/33_DSH-F/_article/-char/en
ParticipantIDs	jstage_primary_article_tjsai_33_1_33_DSH_F_article_char_en
PublicationCentury	2000
PublicationDate	2018/01/01
PublicationDateYYYYMMDD	2018-01-01
PublicationDate_xml	– month: 01 year: 2018 text: 2018/01/01 day: 01
PublicationDecade	2010
PublicationTitle	Transactions of the Japanese Society for Artificial Intelligence
PublicationYear	2018
Publisher	The Japanese Society for Artificial Intelligence
Publisher_xml	– name: The Japanese Society for Artificial Intelligence
References	[Breazeal 04] Breazeal, C.: Social interactions in HRI: The robot view, IEEE Transactions on Man, Cybernetics, and Systems, Vol. 34, No. 2, pp. 181–186 (2004) [Sidner 05] Sidner, C. L., Lee, C., Kidd, C. D., Lesh, N., and Rich, C.: Explorations in engagement for humans and robots, Artificial Intelligence, Vol. 166, No. 1-2, pp. 140–164 (2005) [Glas 16] Glas, D. F., Minaot, T., Ishi, C. T., Kawahara, T., and Ishiguro, H.: ERICA: The ERATO Intelligent Conversational Android, in Proceedings of the IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) (2016) [熊野17] 熊野史朗, 石井亮, 大塚和弘：評定者個人に特化した他者感情理解モデル, 2017 年度人工知能学会全国大会(第31 回), 2H4-OS-35b-3in2 (2017) [Skantze 15] Skantze, G. and Johansson, M.: Modelling situated human-robot interaction using IrisTK, in Proceedings of the Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), pp. 165–167 (2015) [Higashinaka 14] Higashinaka, R., Imamura, K., Meguro, T., Miyazaki, C., Kobayashi, N., Sugiyama, H., Hirano, T., Makino, T., and Matsuo, Y.: Towards an open-domain conversational system fully based on natural language processing, in Proceedings of the International Conference on Computational Linguistics (COLING), pp. 928–939 (2014) [Wilcock 15] Wilcock, G. and Jokinen, K.: Multilingual WikiTalk: Wikipedia-based talking robots that switch languages, in Proceedings of the Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), pp. 162–164 (2015) [Goffman 66] Goffman, E.: Behavior in Public Places: Notes on the Social Organization of Gatherings, Simon & Schuster (1966) [Inoue 15] Inoue, K.,Wakabayashi, Y., Yoshimoto, H., Takanashi, K., and Kawahara, T.: Enhanced speaker diarization with detection of backchannels using eye-gaze information in poster conversations, in Proceedings of Interspeech, pp. 3086–3090 (2015) [Langton 00] Langton, S. R. H., Watt, R., and Bruce, V.: Do the eyes have it? Cues to the direction of social attention, Trends in Cognitive Sciences, Vol. 4, No. 2, pp. 50–59 (2000) [Inoue 16b] Inoue, K., Milhorat, P., Lala, D., Zhao, T., and Kawahara, T.: Talking with ERICA, an autonomous android, in Proceedings of the Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), pp. 212–215 (2016) [Ozkan 11] Ozkan, D. and Morency, L. P.: Modeling wisdom of crowds using latent mixture of discriminative experts, in Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 335–340 (2011) [和田96] 和田さゆり：性格特性用語を用いたBig Five 尺度の作成, 心理学研究, Vol. 67, No. 1, pp. 61–67 (1996) [Huang 16] Huang, Y., Gilmartin, E., and Campbell, N.: Engagement recognition using auditory and visual cues, in Proceedings of Interspeech (2016) [Kumano 13] Kumano, S., Otsuka, K., Matsuda, M., Ishii, R., and Yamato, J.: Using a probabilistic topic model to link observers' perception tendency to personality, in Proceedings of the International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 588–593 (2013) [Nakano 10] Nakano, Y. I. and Ishii, R.: Estimating user’s engagement from eye-gaze behaviors in human-agent conversations, in Proceedings of the ACM Conference on Intelligent User Interfaces (IUI), pp. 139–148 (2010) [Ozkan 10] Ozkan, D., Sagae, K., and Morency, L. P.: Latent mixture of discriminative experts for multimodal prediction modeling, in Proceedings of the International Conference on Computational Linguistics (COLING), pp. 860–868 (2010) [Xu 13] Xu, Q., Li, L., and Wang, G.: Designing engagement-aware agents for multiparty conversations, in Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI), pp. 2233–2242 (2013) [Inoue 16a] Inoue, K., Lala, D., Nakamura, S., Takanashi, K., and Kawahara, T.: Annotation and analysis of listener’s engagement based on multi-modal behaviors, in Proceedings of the International Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction (MA3HMI) (2016) [Barrick 91] Barrick, M. R. and Mount, M. K.: The big five personality dimensions and job performance: A meta-analysis, Personnel Psychology, Vol. 44, No. 1, pp. 1–26 (1991) [Ishi 12] Ishi, C. T., Ishiguro, H., and Hagita, N.: Evaluation of formant-based lip motion generation in tele-operated humanoid robots, in Proceeding of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2377–2382 (2012) [Blei 03] Blei, D. M., Ng, A. Y., and Jordan, M. I.: Latent dirichlet allocation, Journal of Machine Learning Research, Vol. 3, pp. 993–1022 (2003) [Poggi 07] Poggi, I.: Mind, Hands, Face and Body: A Goal and Belief View of Multimodal Communication, Weidler (2007) [Sidner 02] Sidner, C. L. and Dzikovska, M.: Human-robot interaction: Engagement between humans and robots for hosting activities, in Proceedings of the ACM International Conference on Multimodal Interfaces (ICMI), p. 123 (2002) [石井11] 石井亮, 大古亮太, 中野有紀子, 西田豊明：視線と頭部動作に基づくユーザの会話参加態度の推定, 情報処理学会論文誌, Vol. 52, No. 12, pp. 3625–3636 (2011) [Kaushik 15] Kaushik, L., Sangwan, A., and Hansen, J. H. L.: Laughter and filler detection in naturalistic audio, in Proceedings of Interspeech, pp. 2509–2513 (2015) [Glas 15] Glas, N. and Pelachaud, C.: Definitions of engagement in human-agent interaction, in Proceedings of the International Workshop on Engagement in Human Computer Interaction (ENHANCE), pp. 944–949 (2015) [Peters 05] Peters, C.: Direction of attention perception for conversation initiation in virtual environments, in Proceedings of the International Workshop on Intelligent Virtual Agents (IVA), pp. 215–228 (2005) [Bohus 10] Bohus, D. and Horvitz, E.: Facilitating multiparty dialog with gaze, gesture, and speech, in Proceedings of the International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction (ICMI-MLMI), No. 5 (2010) [Kuno 07] Kuno, Y., Sadazuka, K., Kawashima, M., Yamazaki, K., Yamazaki, A., and Kuzuoka, H.: Museum guide robot based on sociological interaction analysis, in Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI), pp. 1191–1194 (2007) [Gosztolya 15] Gosztolya, G.: On evaluation metrics for social signal detection, in Proceedings of Interspeech, pp. 2504–2508 (2015) [Chen 15] Chen, Y., Yu, Y., and Odobez, J.-M.: Head nod detection from a full 3D model, in Proceedings of the International Conference on Computer Vision Workshops (ICCVW), pp. 136–144 (2015) [高梨09] 高梨克也, 榎本美香：「特集–聞き手行動から見たコミュニケーション」編集にあたって, 認知科学, Vol. 16, No. 1, pp. 5–11 (2009) [Yu 16] Yu, Z., Nicolich-Henkin, L., Black, A. W., and Rudnicky, A. I.: A Wizard-of-Oz study on a non-task-oriented dialog systems that reacts to user engagement, in Proceedings of the Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), pp. 55–63 (2016) [境16] 境くりま, 石井カルロス寿憲, 港隆史, 石黒浩：音声に対応する頭部動作のオンライン生成システムと遠隔操作における効果, 電子情報通信学会論文誌, Vol. J99-A, No. 1, pp. 14–24 (2016) [Yu 04] Yu, C., Aoki, P. M., and Woodruff, A.: Detecting user engagement in everyday conversations, in Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp. 1329–1332 (2004) [河原13] 河原達也：音声対話システムの進化と淘汰－歴史と最近の技術動向－, 人工知能学会誌, Vol. 28, No. 1, pp. 45–51 (2013) [Michalowski 06] Michalowski, M. P., Sabanovic, S., and Simmons, R.: A spatial model of engagement for a social robot, in Proceedings of the International Workshop on Advanced Motion Control (AMC), pp. 762–767 (2006) [Morency 06] Morency, L. P., Christoudias, C. M., and Darrell, T.: Recognizing gaze aversion gestures in embodied conversational discourse, in Proceedings of the ACM International Conference on Multimodal Interfaces (ICMI), pp. 287–294 (2006) [藤本07] 藤本学, 大坊郁夫：コミュニケーション・スキルに関する諸因子の階層構造への統合の試み, パーソナリティ研究, Vol. 15, No. 3, pp. 347–361 (2007) [DeVault 14] DeVault, D., Artstein, R., Benn, G., Dey, T., Fast, E., Gainer, A., Georgila, K., Gratch, J., Hartholt, A., Lhommet, M., Lucas, G., Marsella, S., Morbini, F., Nazarian, A., Scherer, S., Stratou, G., Suri, A., Traum, D., Wood, R., Xu, Y., Rizzo, A., and Morency, L. P.: SimSensei Kiosk: A virtual human interviewer for healthcare decision support, in Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 1061–1068 (2014) [Hinton 02] Hinton, G. E.: Training products of experts by minimizing contrastive divergence, Neural Computation, Vol. 14, No. 8, pp. 1771–1800 (2002) [Cerrato 16] Cerrato, L. and Campbell, N.: Engagement in dialogue with social robots, in Proceedings of the International Workshop on Spoken Dialogue Systems (IWSDS) (2016) [千葉16] 千葉祐弥, 伊藤彰則：WOZ システムとの対話におけるユーザの対話意欲の段階識別と特徴量の分析, 人工知能学会研究会資料言語・音声理解と対話処理研究会(SLUD), SIG-SLUDB505-02, pp. 7–12 (2016) [Morency 07] Morency, L. P., Quattoni, A., and Darrell, T.: Latentdynamic discriminative models for continuous gesture recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2007) [Den 11] Den, Y., Yoshida, N., Takanashi, K., and Koiso, H.: Annotation of Japanese response tokens and preliminary analysis on their distribution in three-party conversations, in Proceedings of Oriental COCOSDA, pp. 168–173 (2011)
References_xml
SSID	ssib001234105 ssib008501343 ssib047348305 ssib000961560 ssib026596680 ssj0057238 ssib006575950
Score	2.1903796
Snippet	This article addresses the estimation of engagement level based on the listener’s behaviors such as backchannel, laughing, head nodding, and eye-gaze....
SourceID	jstage
SourceType	Publisher
StartPage	DSH-F_1
SubjectTerms	behavior character dialogue engagement latent model
Title	Engagement Recognition from Listener’s Behaviors in Spoken Dialogue Using a Latent Character Model
URI	https://www.jstage.jst.go.jp/article/tjsai/33/1/33_DSH-F/_article/-char/en
Volume	33
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
ispartofPNX	Transactions of the Japanese Society for Artificial Intelligence, 2018/01/01, Vol.33(1), pp.DSH-F_1-12
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07b9swECbcdOnSd9E3OHQLlFqiqEe2II1hJGmBIg6QTaAoqrESyIElDcmUv9G_11_RsXckJSpthyZdCJuWBFPHu_tIfndHyIeynAoZx8orMP9lmIvAE4zFXlCUecwDKVSJscOfv0Tz43D_hJ9MJj9HrKWuzbfk1V_jSu4iVegDuWKU7C0kOzwUOuAzyBdakDC0_yTjvfqb5a4g_DNEIGQOYsjIIcoPo1ksmyFt-lyIa02BPbpYnakaTJ7Zvdk03AGxeQjgE5632ydy1tXSzscYduFKjDc9xWAffC7WsrzBAt1ZayKSSefhMn8Os7FedXo79WBVLQdAjWlwT03E9rlwNqk5XRregbjqLvGb23I4EzVWhLLckKa7FOOtDD_5bStjcZe_a8w2CyMdimW8mutLpvbUx9p6k3Tjxpw2hvvT0dybZf4IBxh29x8ehgd4xt1WjVhu6ZucIx3ojXZCZPqqjLHMxwauzmZZ_xsG1cEcvkfuB3HKkYx68HUEjNPIHy9cAV8gH9dZSqyrOjqPTjhAeZdYLYg4LGOdZQ4xh5G27AajcCw1p7ci7HuzkaswuI-joQEqq2CN0vMbNeRaPCYP7VqJ7pixPCGTSjwlj_o6JNS6pWekcHpAR3pAUQ9orwc_rr83dNAAuqyp0QDaawDVGkAFNRpABw2gWgOek-PZ3mJ37tnqIV4FmKv1_NQvWMLDSCkfPJPgKlVplCaB8hmPJdZek4LLJI9V4DMpoxKX9mWOFREkUyV7QTbqVa1eEpqqMpnmIp-WLAfAm4qwKGMlIoX2LM3lK7Jt3lN2YVLEZLeYAa__5-Y35IFTpLdko1136h2g5DZ_ryfUL9ZKxLo
link.rule.ids	315,786,790,27955,27956
linkProvider	Colorado Alliance of Research Libraries
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Engagement+Recognition+from+Listener%E2%80%99s+Behaviors+in+Spoken+Dialogue+Using+a+Latent+Character+Model&rft.jtitle=Transactions+of+the+Japanese+Society+for+Artificial+Intelligence&rft.au=Inoue%2C+Koji&rft.au=Divesh%2C+Lala&rft.au=Yoshii%2C+Kazuyoshi&rft.au=Takanashi%2C+Katsuya&rft.date=2018-01-01&rft.pub=The+Japanese+Society+for+Artificial+Intelligence&rft.issn=1346-0714&rft.eissn=1346-8030&rft.volume=33&rft.issue=1&rft.spage=DSH-F_1&rft.epage=12&rft_id=info:doi/10.1527%2Ftjsai.DSH-F&rft.externalDocID=article_tjsai_33_1_33_DSH_F_article_char_en
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1346-0714&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1346-0714&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1346-0714&client=summon