Engagement Recognition from Listener’s Behaviors in Spoken Dialogue Using a Latent Character Model
This article addresses the estimation of engagement level based on the listener’s behaviors such as backchannel, laughing, head nodding, and eye-gaze. Engagement is defined as the level of how much a user is being interested in and willing to continue the current interaction. When the engagement lev...
Saved in:
Published in | Transactions of the Japanese Society for Artificial Intelligence Vol. 33; no. 1; pp. DSH-F_1 - 12 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | Japanese |
Published |
The Japanese Society for Artificial Intelligence
01.01.2018
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | This article addresses the estimation of engagement level based on the listener’s behaviors such as backchannel, laughing, head nodding, and eye-gaze. Engagement is defined as the level of how much a user is being interested in and willing to continue the current interaction. When the engagement level is evaluated by multiple annotators, the criteria for annotating the engagement level would depend on each annotator. We assume that each annotator has its own character which affects the way of perceiving the engagement level. We propose a latent character model which estimates the engagement level and also the character of each annotator as a latent variable. The experimental results show that the latent character model can predict the engagement label of each annotator in higher accuracy than other models which do not take the character into account. |
---|---|
AbstractList | This article addresses the estimation of engagement level based on the listener’s behaviors such as backchannel, laughing, head nodding, and eye-gaze. Engagement is defined as the level of how much a user is being interested in and willing to continue the current interaction. When the engagement level is evaluated by multiple annotators, the criteria for annotating the engagement level would depend on each annotator. We assume that each annotator has its own character which affects the way of perceiving the engagement level. We propose a latent character model which estimates the engagement level and also the character of each annotator as a latent variable. The experimental results show that the latent character model can predict the engagement label of each annotator in higher accuracy than other models which do not take the character into account. |
Author | Yoshii, Kazuyoshi Kawahara, Tatsuya Divesh, Lala Inoue, Koji Takanashi, Katsuya |
Author_xml | – sequence: 1 fullname: Inoue, Koji organization: Graduate School of Informatics, Kyoto University – sequence: 2 fullname: Divesh, Lala organization: Graduate School of Informatics, Kyoto University – sequence: 3 fullname: Yoshii, Kazuyoshi organization: Graduate School of Informatics, Kyoto University – sequence: 4 fullname: Takanashi, Katsuya organization: Graduate School of Informatics, Kyoto University – sequence: 5 fullname: Kawahara, Tatsuya organization: Graduate School of Informatics, Kyoto University |
BookMark | eNo9UM1KAzEYDKJgrT35AnmBrfnZZHePtT9WWBGsPYc0_Xabuk1KEgVvvoav55PYavEyMzAwzMwVOnfeAUI3lAypYMVt2kZth5PFPJudoR7lucxKwsn5SZOC5pdoEKNdEUIZzykRPbSeula3sAOX8DMY3zqbrHe4CX6HaxsTOAjfn18R38FGv1sfIrYOL_b-FRyeWN359g3wMlrXYo1rnY5J440O2iQI-NGvobtGF43uIgxO3EfL2fRlPM_qp_uH8ajOtpSVKaMVXfNS5BKAkpxpARVUsioZUC4KU0rCjBamXBXAKDdGNkwS0aykZIXh0PA-Gv3lbmM6jFL7YHc6fCgdkjUdqN-HFOeKHuHwlJr9e-ZQWYHjP2eiZ60 |
ContentType | Journal Article |
Copyright | The Japanese Society for Artificial Intelligence 2018 |
Copyright_xml | – notice: The Japanese Society for Artificial Intelligence 2018 |
DOI | 10.1527/tjsai.DSH-F |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
EISSN | 1346-8030 |
EndPage | 12 |
ExternalDocumentID | article_tjsai_33_1_33_DSH_F_article_char_en |
GroupedDBID | 123 2WC ACGFS ALMA_UNASSIGNED_HOLDINGS CS3 E3Z EBS EJD JSF KQ8 OK1 PQEST PQQKQ RJT XSB |
ID | FETCH-LOGICAL-j128t-191d38546ee1042a5e9e96982e1357c8602ca5c8b7e213cc6f2605fb6627c3ef3 |
ISSN | 1346-0714 |
IngestDate | Wed Apr 05 12:55:50 EDT 2023 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | true |
Issue | 1 |
Language | Japanese |
LinkModel | OpenURL |
MergedId | FETCHMERGED-LOGICAL-j128t-191d38546ee1042a5e9e96982e1357c8602ca5c8b7e213cc6f2605fb6627c3ef3 |
OpenAccessLink | https://www.jstage.jst.go.jp/article/tjsai/33/1/33_DSH-F/_article/-char/en |
ParticipantIDs | jstage_primary_article_tjsai_33_1_33_DSH_F_article_char_en |
PublicationCentury | 2000 |
PublicationDate | 2018/01/01 |
PublicationDateYYYYMMDD | 2018-01-01 |
PublicationDate_xml | – month: 01 year: 2018 text: 2018/01/01 day: 01 |
PublicationDecade | 2010 |
PublicationTitle | Transactions of the Japanese Society for Artificial Intelligence |
PublicationYear | 2018 |
Publisher | The Japanese Society for Artificial Intelligence |
Publisher_xml | – name: The Japanese Society for Artificial Intelligence |
References | [Breazeal 04] Breazeal, C.: Social interactions in HRI: The robot view, IEEE Transactions on Man, Cybernetics, and Systems, Vol. 34, No. 2, pp. 181–186 (2004) [Sidner 05] Sidner, C. L., Lee, C., Kidd, C. D., Lesh, N., and Rich, C.: Explorations in engagement for humans and robots, Artificial Intelligence, Vol. 166, No. 1-2, pp. 140–164 (2005) [Glas 16] Glas, D. F., Minaot, T., Ishi, C. T., Kawahara, T., and Ishiguro, H.: ERICA: The ERATO Intelligent Conversational Android, in Proceedings of the IEEE International Symposium on Robot and Human Interactive Communication (RO-MAN) (2016) [熊野17] 熊野史朗, 石井亮, 大塚和弘:評定者個人に特化した他者感情理解モデル, 2017 年度人工知能学会全国大会(第31 回), 2H4-OS-35b-3in2 (2017) [Skantze 15] Skantze, G. and Johansson, M.: Modelling situated human-robot interaction using IrisTK, in Proceedings of the Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), pp. 165–167 (2015) [Higashinaka 14] Higashinaka, R., Imamura, K., Meguro, T., Miyazaki, C., Kobayashi, N., Sugiyama, H., Hirano, T., Makino, T., and Matsuo, Y.: Towards an open-domain conversational system fully based on natural language processing, in Proceedings of the International Conference on Computational Linguistics (COLING), pp. 928–939 (2014) [Wilcock 15] Wilcock, G. and Jokinen, K.: Multilingual WikiTalk: Wikipedia-based talking robots that switch languages, in Proceedings of the Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), pp. 162–164 (2015) [Goffman 66] Goffman, E.: Behavior in Public Places: Notes on the Social Organization of Gatherings, Simon & Schuster (1966) [Inoue 15] Inoue, K.,Wakabayashi, Y., Yoshimoto, H., Takanashi, K., and Kawahara, T.: Enhanced speaker diarization with detection of backchannels using eye-gaze information in poster conversations, in Proceedings of Interspeech, pp. 3086–3090 (2015) [Langton 00] Langton, S. R. H., Watt, R., and Bruce, V.: Do the eyes have it? Cues to the direction of social attention, Trends in Cognitive Sciences, Vol. 4, No. 2, pp. 50–59 (2000) [Inoue 16b] Inoue, K., Milhorat, P., Lala, D., Zhao, T., and Kawahara, T.: Talking with ERICA, an autonomous android, in Proceedings of the Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), pp. 212–215 (2016) [Ozkan 11] Ozkan, D. and Morency, L. P.: Modeling wisdom of crowds using latent mixture of discriminative experts, in Proceedings of the Annual Meeting of the Association for Computational Linguistics (ACL), pp. 335–340 (2011) [和田96] 和田さゆり:性格特性用語を用いたBig Five 尺度の作成, 心理学研究, Vol. 67, No. 1, pp. 61–67 (1996) [Huang 16] Huang, Y., Gilmartin, E., and Campbell, N.: Engagement recognition using auditory and visual cues, in Proceedings of Interspeech (2016) [Kumano 13] Kumano, S., Otsuka, K., Matsuda, M., Ishii, R., and Yamato, J.: Using a probabilistic topic model to link observers' perception tendency to personality, in Proceedings of the International Conference on Affective Computing and Intelligent Interaction (ACII), pp. 588–593 (2013) [Nakano 10] Nakano, Y. I. and Ishii, R.: Estimating user’s engagement from eye-gaze behaviors in human-agent conversations, in Proceedings of the ACM Conference on Intelligent User Interfaces (IUI), pp. 139–148 (2010) [Ozkan 10] Ozkan, D., Sagae, K., and Morency, L. P.: Latent mixture of discriminative experts for multimodal prediction modeling, in Proceedings of the International Conference on Computational Linguistics (COLING), pp. 860–868 (2010) [Xu 13] Xu, Q., Li, L., and Wang, G.: Designing engagement-aware agents for multiparty conversations, in Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI), pp. 2233–2242 (2013) [Inoue 16a] Inoue, K., Lala, D., Nakamura, S., Takanashi, K., and Kawahara, T.: Annotation and analysis of listener’s engagement based on multi-modal behaviors, in Proceedings of the International Workshop on Multimodal Analyses Enabling Artificial Agents in Human-Machine Interaction (MA3HMI) (2016) [Barrick 91] Barrick, M. R. and Mount, M. K.: The big five personality dimensions and job performance: A meta-analysis, Personnel Psychology, Vol. 44, No. 1, pp. 1–26 (1991) [Ishi 12] Ishi, C. T., Ishiguro, H., and Hagita, N.: Evaluation of formant-based lip motion generation in tele-operated humanoid robots, in Proceeding of the IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 2377–2382 (2012) [Blei 03] Blei, D. M., Ng, A. Y., and Jordan, M. I.: Latent dirichlet allocation, Journal of Machine Learning Research, Vol. 3, pp. 993–1022 (2003) [Poggi 07] Poggi, I.: Mind, Hands, Face and Body: A Goal and Belief View of Multimodal Communication, Weidler (2007) [Sidner 02] Sidner, C. L. and Dzikovska, M.: Human-robot interaction: Engagement between humans and robots for hosting activities, in Proceedings of the ACM International Conference on Multimodal Interfaces (ICMI), p. 123 (2002) [石井11] 石井亮, 大古亮太, 中野有紀子, 西田豊明:視線と頭部動作に基づくユーザの会話参加態度の推定, 情報処理学会論文誌, Vol. 52, No. 12, pp. 3625–3636 (2011) [Kaushik 15] Kaushik, L., Sangwan, A., and Hansen, J. H. L.: Laughter and filler detection in naturalistic audio, in Proceedings of Interspeech, pp. 2509–2513 (2015) [Glas 15] Glas, N. and Pelachaud, C.: Definitions of engagement in human-agent interaction, in Proceedings of the International Workshop on Engagement in Human Computer Interaction (ENHANCE), pp. 944–949 (2015) [Peters 05] Peters, C.: Direction of attention perception for conversation initiation in virtual environments, in Proceedings of the International Workshop on Intelligent Virtual Agents (IVA), pp. 215–228 (2005) [Bohus 10] Bohus, D. and Horvitz, E.: Facilitating multiparty dialog with gaze, gesture, and speech, in Proceedings of the International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction (ICMI-MLMI), No. 5 (2010) [Kuno 07] Kuno, Y., Sadazuka, K., Kawashima, M., Yamazaki, K., Yamazaki, A., and Kuzuoka, H.: Museum guide robot based on sociological interaction analysis, in Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI), pp. 1191–1194 (2007) [Gosztolya 15] Gosztolya, G.: On evaluation metrics for social signal detection, in Proceedings of Interspeech, pp. 2504–2508 (2015) [Chen 15] Chen, Y., Yu, Y., and Odobez, J.-M.: Head nod detection from a full 3D model, in Proceedings of the International Conference on Computer Vision Workshops (ICCVW), pp. 136–144 (2015) [高梨09] 高梨克也, 榎本美香:「特集–聞き手行動から見たコミュニケーション」編集にあたって, 認知科学, Vol. 16, No. 1, pp. 5–11 (2009) [Yu 16] Yu, Z., Nicolich-Henkin, L., Black, A. W., and Rudnicky, A. I.: A Wizard-of-Oz study on a non-task-oriented dialog systems that reacts to user engagement, in Proceedings of the Annual SIGdial Meeting on Discourse and Dialogue (SIGDIAL), pp. 55–63 (2016) [境16] 境くりま, 石井カルロス寿憲, 港隆史, 石黒浩:音声に対応する頭部動作のオンライン生成システムと遠隔操作における効果, 電子情報通信学会論文誌, Vol. J99-A, No. 1, pp. 14–24 (2016) [Yu 04] Yu, C., Aoki, P. M., and Woodruff, A.: Detecting user engagement in everyday conversations, in Proceedings of the International Conference on Spoken Language Processing (ICSLP), pp. 1329–1332 (2004) [河原13] 河原達也:音声対話システムの進化と淘汰-歴史と最近の技術動向-, 人工知能学会誌, Vol. 28, No. 1, pp. 45–51 (2013) [Michalowski 06] Michalowski, M. P., Sabanovic, S., and Simmons, R.: A spatial model of engagement for a social robot, in Proceedings of the International Workshop on Advanced Motion Control (AMC), pp. 762–767 (2006) [Morency 06] Morency, L. P., Christoudias, C. M., and Darrell, T.: Recognizing gaze aversion gestures in embodied conversational discourse, in Proceedings of the ACM International Conference on Multimodal Interfaces (ICMI), pp. 287–294 (2006) [藤本07] 藤本学, 大坊郁夫:コミュニケーション・スキルに関する諸因子の階層構造への統合の試み, パーソナリティ研究, Vol. 15, No. 3, pp. 347–361 (2007) [DeVault 14] DeVault, D., Artstein, R., Benn, G., Dey, T., Fast, E., Gainer, A., Georgila, K., Gratch, J., Hartholt, A., Lhommet, M., Lucas, G., Marsella, S., Morbini, F., Nazarian, A., Scherer, S., Stratou, G., Suri, A., Traum, D., Wood, R., Xu, Y., Rizzo, A., and Morency, L. P.: SimSensei Kiosk: A virtual human interviewer for healthcare decision support, in Proceedings of the International Conference on Autonomous Agents and Multiagent Systems (AAMAS), pp. 1061–1068 (2014) [Hinton 02] Hinton, G. E.: Training products of experts by minimizing contrastive divergence, Neural Computation, Vol. 14, No. 8, pp. 1771–1800 (2002) [Cerrato 16] Cerrato, L. and Campbell, N.: Engagement in dialogue with social robots, in Proceedings of the International Workshop on Spoken Dialogue Systems (IWSDS) (2016) [千葉16] 千葉祐弥, 伊藤彰則:WOZ システムとの対話におけるユーザの対話意欲の段階識別と特徴量の分析, 人工知能学会研究会資料言語・音声理解と対話処理研究会(SLUD), SIG-SLUDB505-02, pp. 7–12 (2016) [Morency 07] Morency, L. P., Quattoni, A., and Darrell, T.: Latentdynamic discriminative models for continuous gesture recognition, in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2007) [Den 11] Den, Y., Yoshida, N., Takanashi, K., and Koiso, H.: Annotation of Japanese response tokens and preliminary analysis on their distribution in three-party conversations, in Proceedings of Oriental COCOSDA, pp. 168–173 (2011) |
References_xml | |
SSID | ssib001234105 ssib008501343 ssib047348305 ssib000961560 ssib026596680 ssj0057238 ssib006575950 |
Score | 2.1903796 |
Snippet | This article addresses the estimation of engagement level based on the listener’s behaviors such as backchannel, laughing, head nodding, and eye-gaze.... |
SourceID | jstage |
SourceType | Publisher |
StartPage | DSH-F_1 |
SubjectTerms | behavior character dialogue engagement latent model |
Title | Engagement Recognition from Listener’s Behaviors in Spoken Dialogue Using a Latent Character Model |
URI | https://www.jstage.jst.go.jp/article/tjsai/33/1/33_DSH-F/_article/-char/en |
Volume | 33 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
ispartofPNX | Transactions of the Japanese Society for Artificial Intelligence, 2018/01/01, Vol.33(1), pp.DSH-F_1-12 |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwpV07b9swECbcdOnSd9E3OHQLlFqiqEe2II1hJGmBIg6QTaAoqrESyIElDcmUv9G_11_RsXckJSpthyZdCJuWBFPHu_tIfndHyIeynAoZx8orMP9lmIvAE4zFXlCUecwDKVSJscOfv0Tz43D_hJ9MJj9HrKWuzbfk1V_jSu4iVegDuWKU7C0kOzwUOuAzyBdakDC0_yTjvfqb5a4g_DNEIGQOYsjIIcoPo1ksmyFt-lyIa02BPbpYnakaTJ7Zvdk03AGxeQjgE5632ydy1tXSzscYduFKjDc9xWAffC7WsrzBAt1ZayKSSefhMn8Os7FedXo79WBVLQdAjWlwT03E9rlwNqk5XRregbjqLvGb23I4EzVWhLLckKa7FOOtDD_5bStjcZe_a8w2CyMdimW8mutLpvbUx9p6k3Tjxpw2hvvT0dybZf4IBxh29x8ehgd4xt1WjVhu6ZucIx3ojXZCZPqqjLHMxwauzmZZ_xsG1cEcvkfuB3HKkYx68HUEjNPIHy9cAV8gH9dZSqyrOjqPTjhAeZdYLYg4LGOdZQ4xh5G27AajcCw1p7ci7HuzkaswuI-joQEqq2CN0vMbNeRaPCYP7VqJ7pixPCGTSjwlj_o6JNS6pWekcHpAR3pAUQ9orwc_rr83dNAAuqyp0QDaawDVGkAFNRpABw2gWgOek-PZ3mJ37tnqIV4FmKv1_NQvWMLDSCkfPJPgKlVplCaB8hmPJdZek4LLJI9V4DMpoxKX9mWOFREkUyV7QTbqVa1eEpqqMpnmIp-WLAfAm4qwKGMlIoX2LM3lK7Jt3lN2YVLEZLeYAa__5-Y35IFTpLdko1136h2g5DZ_ryfUL9ZKxLo |
link.rule.ids | 315,786,790,27955,27956 |
linkProvider | Colorado Alliance of Research Libraries |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Engagement+Recognition+from+Listener%E2%80%99s+Behaviors+in+Spoken+Dialogue+Using+a+Latent+Character+Model&rft.jtitle=Transactions+of+the+Japanese+Society+for+Artificial+Intelligence&rft.au=Inoue%2C+Koji&rft.au=Divesh%2C+Lala&rft.au=Yoshii%2C+Kazuyoshi&rft.au=Takanashi%2C+Katsuya&rft.date=2018-01-01&rft.pub=The+Japanese+Society+for+Artificial+Intelligence&rft.issn=1346-0714&rft.eissn=1346-8030&rft.volume=33&rft.issue=1&rft.spage=DSH-F_1&rft.epage=12&rft_id=info:doi/10.1527%2Ftjsai.DSH-F&rft.externalDocID=article_tjsai_33_1_33_DSH_F_article_char_en |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1346-0714&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1346-0714&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1346-0714&client=summon |