Engagement Recognition from Listener’s Behaviors in Spoken Dialogue Using a Latent Character Model

This article addresses the estimation of engagement level based on the listener’s behaviors such as backchannel, laughing, head nodding, and eye-gaze. Engagement is defined as the level of how much a user is being interested in and willing to continue the current interaction. When the engagement lev...

Full description

Saved in:
Bibliographic Details
Published inTransactions of the Japanese Society for Artificial Intelligence Vol. 33; no. 1; pp. DSH-F_1 - 12
Main Authors Inoue, Koji, Divesh, Lala, Yoshii, Kazuyoshi, Takanashi, Katsuya, Kawahara, Tatsuya
Format Journal Article
LanguageJapanese
Published The Japanese Society for Artificial Intelligence 01.01.2018
Subjects
Online AccessGet full text

Cover

Loading…
Abstract This article addresses the estimation of engagement level based on the listener’s behaviors such as backchannel, laughing, head nodding, and eye-gaze. Engagement is defined as the level of how much a user is being interested in and willing to continue the current interaction. When the engagement level is evaluated by multiple annotators, the criteria for annotating the engagement level would depend on each annotator. We assume that each annotator has its own character which affects the way of perceiving the engagement level. We propose a latent character model which estimates the engagement level and also the character of each annotator as a latent variable. The experimental results show that the latent character model can predict the engagement label of each annotator in higher accuracy than other models which do not take the character into account.
AbstractList This article addresses the estimation of engagement level based on the listener’s behaviors such as backchannel, laughing, head nodding, and eye-gaze. Engagement is defined as the level of how much a user is being interested in and willing to continue the current interaction. When the engagement level is evaluated by multiple annotators, the criteria for annotating the engagement level would depend on each annotator. We assume that each annotator has its own character which affects the way of perceiving the engagement level. We propose a latent character model which estimates the engagement level and also the character of each annotator as a latent variable. The experimental results show that the latent character model can predict the engagement label of each annotator in higher accuracy than other models which do not take the character into account.
Author Yoshii, Kazuyoshi
Kawahara, Tatsuya
Divesh, Lala
Inoue, Koji
Takanashi, Katsuya
Author_xml – sequence: 1
  fullname: Inoue, Koji
  organization: Graduate School of Informatics, Kyoto University
– sequence: 2
  fullname: Divesh, Lala
  organization: Graduate School of Informatics, Kyoto University
– sequence: 3
  fullname: Yoshii, Kazuyoshi
  organization: Graduate School of Informatics, Kyoto University
– sequence: 4
  fullname: Takanashi, Katsuya
  organization: Graduate School of Informatics, Kyoto University
– sequence: 5
  fullname: Kawahara, Tatsuya
  organization: Graduate School of Informatics, Kyoto University
BookMark eNo9UM1KAzEYDKJgrT35AnmBrfnZZHePtT9WWBGsPYc0_Xabuk1KEgVvvoav55PYavEyMzAwzMwVOnfeAUI3lAypYMVt2kZth5PFPJudoR7lucxKwsn5SZOC5pdoEKNdEUIZzykRPbSeula3sAOX8DMY3zqbrHe4CX6HaxsTOAjfn18R38FGv1sfIrYOL_b-FRyeWN359g3wMlrXYo1rnY5J440O2iQI-NGvobtGF43uIgxO3EfL2fRlPM_qp_uH8ajOtpSVKaMVXfNS5BKAkpxpARVUsioZUC4KU0rCjBamXBXAKDdGNkwS0aykZIXh0PA-Gv3lbmM6jFL7YHc6fCgdkjUdqN-HFOeKHuHwlJr9e-ZQWYHjP2eiZ60
ContentType Journal Article
Copyright The Japanese Society for Artificial Intelligence 2018
Copyright_xml – notice: The Japanese Society for Artificial Intelligence 2018
DOI 10.1527/tjsai.DSH-F
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1346-8030
EndPage 12
ExternalDocumentID article_tjsai_33_1_33_DSH_F_article_char_en
GroupedDBID 123
2WC
ACGFS
ALMA_UNASSIGNED_HOLDINGS
CS3
E3Z
EBS
EJD
JSF
KQ8
OK1
PQEST
PQQKQ
RJT
XSB
ID FETCH-LOGICAL-j128t-191d38546ee1042a5e9e96982e1357c8602ca5c8b7e213cc6f2605fb6627c3ef3
ISSN 1346-0714
IngestDate Wed Apr 05 12:55:50 EDT 2023
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Issue 1
Language Japanese
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-j128t-191d38546ee1042a5e9e96982e1357c8602ca5c8b7e213cc6f2605fb6627c3ef3
OpenAccessLink https://www.jstage.jst.go.jp/article/tjsai/33/1/33_DSH-F/_article/-char/en
ParticipantIDs jstage_primary_article_tjsai_33_1_33_DSH_F_article_char_en
PublicationCentury 2000
PublicationDate 2018/01/01
PublicationDateYYYYMMDD 2018-01-01
PublicationDate_xml – month: 01
  year: 2018
  text: 2018/01/01
  day: 01
PublicationDecade 2010
PublicationTitle Transactions of the Japanese Society for Artificial Intelligence
PublicationYear 2018
Publisher The Japanese Society for Artificial Intelligence
Publisher_xml – name: The Japanese Society for Artificial Intelligence
References [Breazeal 04] Breazeal, C.: Social interactions in HRI: The robot view, IEEE Transactions on Man, Cybernetics, and Systems, Vol. 34, No. 2, pp. 181–186 (2004)
[Sidner 05] Sidner, C. L., Lee, C., Kidd, C. D., Lesh, N., and Rich, C.: Explorations in engagement for humans and robots, Artificial Intelligence, Vol. 166, No. 1-2, pp. 140–164 (2005)
[Glas 16] Glas, D. F., Minaot, T., Ishi, C. T., Kawahara, T., and Ishiguro, H.: ERICA: The ERATO Intelligent Conversational Android, in Proceedings of the IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) (2016)
[熊野17] 熊野史朗, 石井亮, 大塚和弘:評定者個人に特化した他者感情理解モデル, 2017 年度人工知能学会全国大会(第31 回), 2H4-OS-35b-3in2 (2017)
[Skantze 15] Skantze, G. and Johansson, M.: Modelling situated human-robot interaction using IrisTK, in Proceedings of the Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), pp. 165–167 (2015)
[Higashinaka 14] Higashinaka, R., Imamura, K., Meguro, T., Miyazaki, C., Kobayashi, N., Sugiyama, H., Hirano, T., Makino, T., and Matsuo, Y.: Towards an open-domain conversational system fully based on natural language processing, in Proceedings of the International Conference on Computational Linguistics (COLING), pp. 928–939 (2014)
[Wilcock 15] Wilcock, G. and Jokinen, K.: Multilingual WikiTalk: Wikipedia-based talking robots that switch languages, in Proceedings of the Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), pp. 162–164 (2015)
[Goffman 66] Goffman, E.: Behavior in Public Places: Notes on the Social Organization of Gatherings, Simon & Schuster (1966)
[Inoue 15] Inoue, K.,Wakabayashi, Y., Yoshimoto, H., Takanashi, K., and Kawahara, T.: Enhanced speaker diarization with detection of backchannels using eye-gaze information in poster conversations, in Proceedings of Interspeech, pp. 3086–3090 (2015)
[Langton 00] Langton, S. R. H., Watt, R., and Bruce, V.: Do the eyes have it? Cues to the direction of social attention, Trends in Cognitive Sciences, Vol. 4, No. 2, pp. 50–59 (2000)
[Inoue 16b] Inoue, K., Milhorat, P., Lala, D., Zhao, T., and Kawahara, T.: Talking with ERICA, an autonomous android, in Proceedings of the Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), pp. 212–215 (2016)
[Ozkan 11] Ozkan, D. and Morency, L. P.: Modeling wisdom of crowds using latent mixture of discriminative experts, in Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 335–340 (2011)
[和田96] 和田さゆり:性格特性用語を用いたBig Five 尺度の作成, 心理学研究, Vol. 67, No. 1, pp. 61–67 (1996)
[Huang 16] Huang, Y., Gilmartin, E., and Campbell, N.: Engagement recognition using auditory and visual cues, in Proceedings of Interspeech (2016)
[Kumano 13] Kumano, S., Otsuka, K., Matsuda, M., Ishii, R., and Yamato, J.: Using a probabilistic topic model to link observers' perception tendency to personality, in Proceedings of the International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 588–593 (2013)
[Nakano 10] Nakano, Y. I. and Ishii, R.: Estimating user’s engagement from eye-gaze behaviors in human-agent conversations, in Proceedings of the ACM Conference on Intelligent User Interfaces (IUI), pp. 139–148 (2010)
[Ozkan 10] Ozkan, D., Sagae, K., and Morency, L. P.: Latent mixture of discriminative experts for multimodal prediction modeling, in Proceedings of the International Conference on Computational Linguistics (COLING), pp. 860–868 (2010)
[Xu 13] Xu, Q., Li, L., and Wang, G.: Designing engagement-aware agents for multiparty conversations, in Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI), pp. 2233–2242 (2013)
[Inoue 16a] Inoue, K., Lala, D., Nakamura, S., Takanashi, K., and Kawahara, T.: Annotation and analysis of listener’s engagement based on multi-modal behaviors, in Proceedings of the International Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction (MA3HMI) (2016)
[Barrick 91] Barrick, M. R. and Mount, M. K.: The big five personality dimensions and job performance: A meta-analysis, Personnel Psychology, Vol. 44, No. 1, pp. 1–26 (1991)
[Ishi 12] Ishi, C. T., Ishiguro, H., and Hagita, N.: Evaluation of formant-based lip motion generation in tele-operated humanoid robots, in Proceeding of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2377–2382 (2012)
[Blei 03] Blei, D. M., Ng, A. Y., and Jordan, M. I.: Latent dirichlet allocation, Journal of Machine Learning Research, Vol. 3, pp. 993–1022 (2003)
[Poggi 07] Poggi, I.: Mind, Hands, Face and Body: A Goal and Belief View of Multimodal Communication, Weidler (2007)
[Sidner 02] Sidner, C. L. and Dzikovska, M.: Human-robot interaction: Engagement between humans and robots for hosting activities, in Proceedings of the ACM International Conference on Multimodal Interfaces (ICMI), p. 123 (2002)
[石井11] 石井亮, 大古亮太, 中野有紀子, 西田豊明:視線と頭部動作に基づくユーザの会話参加態度の推定, 情報処理学会論文誌, Vol. 52, No. 12, pp. 3625–3636 (2011)
[Kaushik 15] Kaushik, L., Sangwan, A., and Hansen, J. H. L.: Laughter and filler detection in naturalistic audio, in Proceedings of Interspeech, pp. 2509–2513 (2015)
[Glas 15] Glas, N. and Pelachaud, C.: Definitions of engagement in human-agent interaction, in Proceedings of the International Workshop on Engagement in Human Computer Interaction (ENHANCE), pp. 944–949 (2015)
[Peters 05] Peters, C.: Direction of attention perception for conversation initiation in virtual environments, in Proceedings of the International Workshop on Intelligent Virtual Agents (IVA), pp. 215–228 (2005)
[Bohus 10] Bohus, D. and Horvitz, E.: Facilitating multiparty dialog with gaze, gesture, and speech, in Proceedings of the International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction (ICMI-MLMI), No. 5 (2010)
[Kuno 07] Kuno, Y., Sadazuka, K., Kawashima, M., Yamazaki, K., Yamazaki, A., and Kuzuoka, H.: Museum guide robot based on sociological interaction analysis, in Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI), pp. 1191–1194 (2007)
[Gosztolya 15] Gosztolya, G.: On evaluation metrics for social signal detection, in Proceedings of Interspeech, pp. 2504–2508 (2015)
[Chen 15] Chen, Y., Yu, Y., and Odobez, J.-M.: Head nod detection from a full 3D model, in Proceedings of the International Conference on Computer Vision Workshops (ICCVW), pp. 136–144 (2015)
[高梨09] 高梨克也, 榎本美香:「特集–聞き手行動から見たコミュニケーション」編集にあたって, 認知科学, Vol. 16, No. 1, pp. 5–11 (2009)
[Yu 16] Yu, Z., Nicolich-Henkin, L., Black, A. W., and Rudnicky, A. I.: A Wizard-of-Oz study on a non-task-oriented dialog systems that reacts to user engagement, in Proceedings of the Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), pp. 55–63 (2016)
[境16] 境くりま, 石井カルロス寿憲, 港隆史, 石黒浩:音声に対応する頭部動作のオンライン生成システムと遠隔操作における効果, 電子情報通信学会論文誌, Vol. J99-A, No. 1, pp. 14–24 (2016)
[Yu 04] Yu, C., Aoki, P. M., and Woodruff, A.: Detecting user engagement in everyday conversations, in Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp. 1329–1332 (2004)
[河原13] 河原達也:音声対話システムの進化と淘汰-歴史と最近の技術動向-, 人工知能学会誌, Vol. 28, No. 1, pp. 45–51 (2013)
[Michalowski 06] Michalowski, M. P., Sabanovic, S., and Simmons, R.: A spatial model of engagement for a social robot, in Proceedings of the International Workshop on Advanced Motion Control (AMC), pp. 762–767 (2006)
[Morency 06] Morency, L. P., Christoudias, C. M., and Darrell, T.: Recognizing gaze aversion gestures in embodied conversational discourse, in Proceedings of the ACM International Conference on Multimodal Interfaces (ICMI), pp. 287–294 (2006)
[藤本07] 藤本学, 大坊郁夫:コミュニケーション・スキルに関する諸因子の階層構造への統合の試み, パーソナリティ研究, Vol. 15, No. 3, pp. 347–361 (2007)
[DeVault 14] DeVault, D., Artstein, R., Benn, G., Dey, T., Fast, E., Gainer, A., Georgila, K., Gratch, J., Hartholt, A., Lhommet, M., Lucas, G., Marsella, S., Morbini, F., Nazarian, A., Scherer, S., Stratou, G., Suri, A., Traum, D., Wood, R., Xu, Y., Rizzo, A., and Morency, L. P.: SimSensei Kiosk: A virtual human interviewer for healthcare decision support, in Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 1061–1068 (2014)
[Hinton 02] Hinton, G. E.: Training products of experts by minimizing contrastive divergence, Neural Computation, Vol. 14, No. 8, pp. 1771–1800 (2002)
[Cerrato 16] Cerrato, L. and Campbell, N.: Engagement in dialogue with social robots, in Proceedings of the International Workshop on Spoken Dialogue Systems (IWSDS) (2016)
[千葉16] 千葉祐弥, 伊藤彰則:WOZ システムとの対話におけるユーザの対話意欲の段階識別と特徴量の分析, 人工知能学会研究会資料言語・音声理解と対話処理研究会(SLUD), SIG-SLUDB505-02, pp. 7–12 (2016)
[Morency 07] Morency, L. P., Quattoni, A., and Darrell, T.: Latentdynamic discriminative models for continuous gesture recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2007)
[Den 11] Den, Y., Yoshida, N., Takanashi, K., and Koiso, H.: Annotation of Japanese response tokens and preliminary analysis on their distribution in three-party conversations, in Proceedings of Oriental COCOSDA, pp. 168–173 (2011)
References_xml
SSID ssib001234105
ssib008501343
ssib047348305
ssib000961560
ssib026596680
ssj0057238
ssib006575950
Score 2.1903796
Snippet This article addresses the estimation of engagement level based on the listener’s behaviors such as backchannel, laughing, head nodding, and eye-gaze....
SourceID jstage
SourceType Publisher
StartPage DSH-F_1
SubjectTerms behavior
character
dialogue
engagement
latent model
Title Engagement Recognition from Listener’s Behaviors in Spoken Dialogue Using a Latent Character Model
URI https://www.jstage.jst.go.jp/article/tjsai/33/1/33_DSH-F/_article/-char/en
Volume 33
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
ispartofPNX Transactions of the Japanese Society for Artificial Intelligence, 2018/01/01, Vol.33(1), pp.DSH-F_1-12
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07b9swECbcdOnSd9E3OHQLlFqiqEe2II1hJGmBIg6QTaAoqrESyIElDcmUv9G_11_RsXckJSpthyZdCJuWBFPHu_tIfndHyIeynAoZx8orMP9lmIvAE4zFXlCUecwDKVSJscOfv0Tz43D_hJ9MJj9HrKWuzbfk1V_jSu4iVegDuWKU7C0kOzwUOuAzyBdakDC0_yTjvfqb5a4g_DNEIGQOYsjIIcoPo1ksmyFt-lyIa02BPbpYnakaTJ7Zvdk03AGxeQjgE5632ydy1tXSzscYduFKjDc9xWAffC7WsrzBAt1ZayKSSefhMn8Os7FedXo79WBVLQdAjWlwT03E9rlwNqk5XRregbjqLvGb23I4EzVWhLLckKa7FOOtDD_5bStjcZe_a8w2CyMdimW8mutLpvbUx9p6k3Tjxpw2hvvT0dybZf4IBxh29x8ehgd4xt1WjVhu6ZucIx3ojXZCZPqqjLHMxwauzmZZ_xsG1cEcvkfuB3HKkYx68HUEjNPIHy9cAV8gH9dZSqyrOjqPTjhAeZdYLYg4LGOdZQ4xh5G27AajcCw1p7ci7HuzkaswuI-joQEqq2CN0vMbNeRaPCYP7VqJ7pixPCGTSjwlj_o6JNS6pWekcHpAR3pAUQ9orwc_rr83dNAAuqyp0QDaawDVGkAFNRpABw2gWgOek-PZ3mJ37tnqIV4FmKv1_NQvWMLDSCkfPJPgKlVplCaB8hmPJdZek4LLJI9V4DMpoxKX9mWOFREkUyV7QTbqVa1eEpqqMpnmIp-WLAfAm4qwKGMlIoX2LM3lK7Jt3lN2YVLEZLeYAa__5-Y35IFTpLdko1136h2g5DZ_ryfUL9ZKxLo
link.rule.ids 315,786,790,27955,27956
linkProvider Colorado Alliance of Research Libraries
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Engagement+Recognition+from+Listener%E2%80%99s+Behaviors+in+Spoken+Dialogue+Using+a+Latent+Character+Model&rft.jtitle=Transactions+of+the+Japanese+Society+for+Artificial+Intelligence&rft.au=Inoue%2C+Koji&rft.au=Divesh%2C+Lala&rft.au=Yoshii%2C+Kazuyoshi&rft.au=Takanashi%2C+Katsuya&rft.date=2018-01-01&rft.pub=The+Japanese+Society+for+Artificial+Intelligence&rft.issn=1346-0714&rft.eissn=1346-8030&rft.volume=33&rft.issue=1&rft.spage=DSH-F_1&rft.epage=12&rft_id=info:doi/10.1527%2Ftjsai.DSH-F&rft.externalDocID=article_tjsai_33_1_33_DSH_F_article_char_en
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1346-0714&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1346-0714&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1346-0714&client=summon