A hierarchical depression detection model based on vocal and emotional cues

Effective and efficient automatic depression diagnosis is a challenging subject in the field of affective computing. Since speech signals provide useful information for diagnosing depression, in this paper, we propose to extract deep speaker recognition (SR) and speech emotion recognition (SER) feat...

Full description

Saved in:

Bibliographic Details
Published in	Neurocomputing (Amsterdam) Vol. 441; pp. 279 - 290
Main Authors	Dong, Yizhuo, Yang, Xinyu
Format	Journal Article
Language	English
Published	Elsevier B.V 21.06.2021
Subjects	Depression detection Feature variation coordination Hierarchical model Pretrained model Depression detection Hierarchical model Pretrained model Feature variation coordination
Online Access	Get full text

Cover

Loading…

Abstract	Effective and efficient automatic depression diagnosis is a challenging subject in the field of affective computing. Since speech signals provide useful information for diagnosing depression, in this paper, we propose to extract deep speaker recognition (SR) and speech emotion recognition (SER) features using pretrained models, and combine the two deep speech features to take advantage of the complementary information between the vocal and emotional differences of speakers. In addition, due to the small amount of data for depression recognition and the cost sensitivity of the diagnosis results, we propose a hierarchical depression detection model, in which multiple classifiers are set up prior to a regressor to guide the prediction of depression severity. We test our method on the AVEC 2013 and AVEC 2014 benchmark databases. The results demonstrate that the fusion of deep SR and SER features can improve the prediction performance of the model. The proposed method, using only audio features, can avoid the overfitting problem and achieves better performance than the previous audio-based methods on both databases. It also provides results comparable to those of video-based and multimodal-based methods for depression detection.
AbstractList	Effective and efficient automatic depression diagnosis is a challenging subject in the field of affective computing. Since speech signals provide useful information for diagnosing depression, in this paper, we propose to extract deep speaker recognition (SR) and speech emotion recognition (SER) features using pretrained models, and combine the two deep speech features to take advantage of the complementary information between the vocal and emotional differences of speakers. In addition, due to the small amount of data for depression recognition and the cost sensitivity of the diagnosis results, we propose a hierarchical depression detection model, in which multiple classifiers are set up prior to a regressor to guide the prediction of depression severity. We test our method on the AVEC 2013 and AVEC 2014 benchmark databases. The results demonstrate that the fusion of deep SR and SER features can improve the prediction performance of the model. The proposed method, using only audio features, can avoid the overfitting problem and achieves better performance than the previous audio-based methods on both databases. It also provides results comparable to those of video-based and multimodal-based methods for depression detection.
Author	Yang, Xinyu Dong, Yizhuo
Author_xml	– sequence: 1 givenname: Yizhuo surname: Dong fullname: Dong, Yizhuo organization: Department of Computer Science and Technology, Xi’an Jiaotong University, China – sequence: 2 givenname: Xinyu surname: Yang fullname: Yang, Xinyu organization: Department of Computer Science and Technology, Xi’an Jiaotong University, China
BookMark	eNqFkM9OwzAMxiM0JLbBG3DoC7Qkzpq2HJCmiX9iEhc4R5njapnWZkq6Sbw9qcaJA5zsz_L3yf7N2KT3PTF2K3ghuFB3u6KnI_quAA6i4FBw0VywqagryGuo1YRNeQNlDlLAFZvFuONcVAKaKXtbZltHwQTcOjT7zNIhUIzO96kdCIex67ylfbYxkWyW5MmPm6a3GXV-XEgKjxSv2WVr9pFufuqcfT49fqxe8vX78-tquc5RcjXkFssay7KBGsEIUBVZrBCllYLLNFFATc051S2IRWWVtKZp1MLIdkOgLJdzdn_OxeBjDNRqdIMZDxmCcXstuB6x6J0-Y9EjFs1BJyzJvPhlPgTXmfD1n-3hbKP02CkR0xEd9UjWhURJW-_-DvgGn0-BVA
CitedBy_id	crossref_primary_10_3389_fpsyt_2024_1466507 crossref_primary_10_1016_j_bspc_2023_104970 crossref_primary_10_1145_3631452 crossref_primary_10_1016_j_bspc_2023_105046 crossref_primary_10_1049_cit2_12174 crossref_primary_10_1109_TAFFC_2023_3272553 crossref_primary_10_1016_j_jad_2022_11_060 crossref_primary_10_32604_iasc_2023_033360 crossref_primary_10_1016_j_compbiolchem_2024_108232 crossref_primary_10_1109_TKDE_2024_3350071 crossref_primary_10_1007_s11042_023_18076_w crossref_primary_10_7717_peerj_cs_2301 crossref_primary_10_1016_j_smhl_2022_100282 crossref_primary_10_1049_cit2_12113 crossref_primary_10_3390_bioengineering11030219 crossref_primary_10_3390_electronics12020328 crossref_primary_10_3390_electronics11050676 crossref_primary_10_3390_s22197561 crossref_primary_10_1109_TCDS_2023_3273614 crossref_primary_10_1109_ACCESS_2024_3426670 crossref_primary_10_1080_20590776_2022_2131389 crossref_primary_10_1016_j_heliyon_2024_e25959 crossref_primary_10_1109_TAFFC_2022_3179478 crossref_primary_10_1016_j_bspc_2023_105675 crossref_primary_10_1007_s00521_021_06426_4 crossref_primary_10_1016_j_neucom_2022_04_084 crossref_primary_10_1049_sil2_12207 crossref_primary_10_1097_HRP_0000000000000356 crossref_primary_10_1016_j_engappai_2025_110354 crossref_primary_10_1038_s41598_025_88313_9 crossref_primary_10_1016_j_compbiomed_2023_106741 crossref_primary_10_1016_j_bspc_2023_105704 crossref_primary_10_1111_aphw_12639 crossref_primary_10_2139_ssrn_4180783 crossref_primary_10_1016_j_inffus_2021_10_012
Cites_doi	10.1016/j.specom.2015.03.004 10.1016/j.neucom.2019.08.046 10.1145/3133944.3133953 10.21437/Interspeech.2019-1617 10.1109/EMBC.2018.8513610 10.1109/TAFFC.2017.2740923 10.1145/2661806.2661807 10.1109/TCDS.2017.2721552 10.1186/s13640-017-0212-3 10.1145/3133944.3133949 10.1145/2512530.2512533 10.1016/j.yebeh.2012.07.007 10.1016/j.neucom.2020.01.048 10.1016/j.jbi.2018.05.007 10.1145/3133944.3133945 10.21437/Interspeech.2020-2396 10.1109/TAFFC.2016.2634527 10.1016/j.neucom.2018.03.068 10.1145/2988257.2988258 10.1109/COMPSAC.2017.228 10.1109/FG.2019.8756568 10.1109/EMBC.2017.8037103 10.1109/TAFFC.2017.2766145 10.1109/TAFFC.2015.2440264 10.1109/TAFFC.2018.2828819 10.1016/j.csl.2018.08.004 10.1109/TAFFC.2017.2724035
ContentType	Journal Article
Copyright	2021 Elsevier B.V.
Copyright_xml	– notice: 2021 Elsevier B.V.
DBID	AAYXX CITATION
DOI	10.1016/j.neucom.2021.02.019
DatabaseName	CrossRef
DatabaseTitle	CrossRef
DatabaseTitleList
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1872-8286
EndPage	290
ExternalDocumentID	10_1016_j_neucom_2021_02_019 S0925231221002654
GroupedDBID	--- --K --M .DC .~1 0R~ 123 1B1 1~. 1~5 4.4 457 4G. 53G 5VS 7-5 71M 8P~ 9JM 9JN AABNK AACTN AADPK AAEDT AAEDW AAIAV AAIKJ AAKOC AALRI AAOAW AAQFI AAXLA AAXUO AAYFN ABBOA ABCQJ ABFNM ABJNI ABMAC ABYKQ ACDAQ ACGFS ACRLP ACZNC ADBBV ADEZE AEBSH AEKER AENEX AFKWA AFTJW AFXIZ AGHFR AGUBO AGWIK AGYEJ AHHHB AHZHX AIALX AIEXJ AIKHN AITUG AJOXV ALMA_UNASSIGNED_HOLDINGS AMFUW AMRAJ AOUOD AXJTR BKOJK BLXMC CS3 DU5 EBS EFJIC EFLBG EO8 EO9 EP2 EP3 F5P FDB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ IHE J1W KOM LG9 M41 MO0 MOBAO N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 ROL RPZ SDF SDG SDP SES SPC SPCBC SSN SSV SSZ T5K ZMT ~G- 29N AAQXK AATTM AAXKI AAYWO AAYXX ABWVN ABXDB ACNNM ACRPL ACVFH ADCNI ADJOM ADMUD ADNMO AEIPS AEUPX AFJKZ AFPUW AGCQF AGQPQ AGRNS AIGII AIIUN AKBMS AKRWK AKYEP ANKPU APXCP ASPBG AVWKF AZFZN BNPGV CITATION EJD FEDTE FGOYB HLZ HVGLF HZ~ R2- RIG SBC SEW SSH WUQ XPP
ID	FETCH-LOGICAL-c306t-dc58c55928c2a1267edc7cc3d31032a162e9800e8f2147d63da9964a3fbe26d03
IEDL.DBID	.~1
ISSN	0925-2312
IngestDate	Thu Apr 24 23:09:24 EDT 2025 Tue Jul 01 01:46:57 EDT 2025 Fri Feb 23 02:44:49 EST 2024
IsPeerReviewed	true
IsScholarly	true
Keywords	Depression detection Hierarchical model Pretrained model Feature variation coordination
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c306t-dc58c55928c2a1267edc7cc3d31032a162e9800e8f2147d63da9964a3fbe26d03
PageCount	12
ParticipantIDs	crossref_citationtrail_10_1016_j_neucom_2021_02_019 crossref_primary_10_1016_j_neucom_2021_02_019 elsevier_sciencedirect_doi_10_1016_j_neucom_2021_02_019
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2021-06-21
PublicationDateYYYYMMDD	2021-06-21
PublicationDate_xml	– month: 06 year: 2021 text: 2021-06-21 day: 21
PublicationDecade	2020
PublicationTitle	Neurocomputing (Amsterdam)
PublicationYear	2021
Publisher	Elsevier B.V
Publisher_xml	– name: Elsevier B.V
References	S. Alghowinem, Multimodal analysis of verbal and nonverbal behaviour on the example of clinical depression, Ph.D. thesis. The Australian National University. Chao, Tao, Yang, Li (b0030) 2015 De Melo, Granger, Hadid (b0035) 2019 C. Li, X. Ma, B. Jiang, X. Li, X. Zhang, X. Liu, Y. Cao, A. Kannan, Z. Zhu, Deep speaker: an end-to-end neural speaker embedding system, arXiv preprint arXiv:1705.02304. Tang, Zeng, Li (b0095) 2018; 2018 M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, M. Pantic, Avec 2013: the continuous audio/visual emotion and depression recognition challenge, in: Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge, ACM, 2013, pp. 3–10. doi:10.1145/2512530.2512533. Hao, Cao, Liu, Wu, Xiao (b0085) 2020; 391 Alghowinem, Goecke, Wagner, Epps, Hyett, Parker, Breakspear (b0070) 2018; 9 He, Jiang, Sahli (b0140) 2015 Z. Zhao, Q. Li, N. Cummins, B. Liu, H. Wang, J. Tao, B.W. Schuller, Hybrid network feature extraction for depression assessment from speech, in: Proc. Interspeech 2020, 2020, pp. 4956–4960. doi:10.21437/Interspeech.2020-2396. M. Valstar, B. Schuller, K. Smith, T. Almaev, F. Eyben, J. Krajewski, R. Cowie, M. Pantic, Avec 2014: 3d dimensional affect and depression recognition challenge, in: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, ACM, 2014, pp. 3–10. doi:10.1145/2661806.2661807. He, Cao (b0020) 2018; 83 Cummins, Amiriparian, Hagerer, Batliner, Steidl, Schuller (b0160) 2017 S.A. Qureshi, S. Saha, M. Hasanuzzaman, G. Dias, E. Cambria, Multi-task representation learning for multimodal estimation of depression level, IEEE Intelligent Systems. Williamson, Godoy, Cha, Schwarzentruber, Khorrami, Gwon, Kung, Dagli, Quatieri (b0185) 2016 Mollahosseini, Hasani, Mahoor (b0215) 2017; 10 Al Jazaery, Guo (b0250) 2018; 1 Scherer, Lucas, Gratch, Rizzo, Morency (b0005) 2015; 7 F. Ringeval, B. Schuller, M. Valstar, J. Gratch, R. Cowie, S. Scherer, S. Mozgai, N. Cummins, M. Schmitt, M. Pantic, Avec 2017: Real-life depression, and affect recognition workshop and challenge, in: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, ACM, 2017, pp. 3–9. doi:10.1145/3133944.3133953. Pampouchidou, Simantiraki, Fazlollahi, Pediaditis, Manousos, Roniotis, Giannakakis, Meriaudeau, Simos, Marias (b0145) 2016 Pérez Espinosa, Escalante, Villaseñor-Pineda, Montes-y Gómez, Pinto-Avedaño, Reyez-Meza (b0210) 2014 Williamson, Quatieri, Helfer, Ciccarelli, Mehta (b0180) 2014 Pampouchidou, Simos, Marias, Meriaudeau, Yang, Pediaditis, Tsiknakis (b0010) 2019; 10 D. Siegmund, L. Chiesa, O. Hörr, F. Gabler, A. Braun, A. Kuijper, Talis-a design study for a wearable device to assist people with depression, in: 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), Vol. 2, IEEE, 2017, pp. 543–548. doi:10.1109/COMPSAC.2017.228. de Melo, Granger, Hadid (b0245) 2020; 1 Zhou, Jin, Shang, Guo (b0255) 2020; 11 M. Niu, J. Tao, B. Liu, C. Fan, Automatic depression level detection via lp-norm pooling, in: Proc. Interspeech 2019, 2019, pp. 4559–4563. doi:10.21437/Interspeech.2019-1617. M. Valstar, J. Gratch, B. Schuller, F. Ringeval, D. Lalanne, M. Torres Torres, S. Scherer, G. Stratou, R. Cowie, M. Pantic, Avec 2016: Depression, mood, and emotion recognition workshop and challenge, in: Proceedings of the 6th international workshop on audio/visual emotion challenge, ACM, 2016, pp. 3–10. doi:10.1145/2988257.2988258. Ma, Huang, Wang, Wang (b0135) 2016 Williamson, Young, Nierenberg, Niemi, Helfer, Quatieri (b0050) 2019; 55 Jan, Meng, Gaus, Zhang (b0045) 2017; 10 Williamson, Bliss, Browne, Narayanan (b0175) 2012; 25 Song, Jaiswal, Shen, Valstar (b0260) 2020; 1 Yang, Jiang, He, Pei, Oveneke, Sahli (b0120) 2016 S. Harati, A. Crowell, H. Mayberg, S. Nemati, Depression severity classification from speech emotion, in: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, 2018, pp. 5763–5766. doi:10.1109/EMBC.2018.8513610. W.C. de Melo, E. Granger, A. Hadid, Combining global and local convolutional 3d networks for detecting depression from facial expressions, in: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), IEEE, 2019, pp. 1–8. doi:10.1109/FG.2019.8756568. A. Pampouchidou, O. Simantiraki, C.-M. Vazakopoulou, C. Chatzaki, M. Pediaditis, A. Maridaki, K. Marias, P. Simos, F. Yang, F. Meriaudeau, et al., Facial geometry and speech analysis for depression detection, in: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, 2017, pp. 1433–1436. doi:10.1109/EMBC.2017.8037103. M.R. Morales, Multimodal depression detection: an investigation of features and fusion techniques for automated systems, Ph.D. thesis. City University of New York. S. Chen, Q. Jin, J. Zhao, S. Wang, Multimodal multi-task learning for dimensional and continuous emotion recognition, in: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, ACM, 2017, pp. 19–26. doi:10.1145/3133944.3133949. He, Zhang, Ren, Sun (b0195) 2016 Cummins, Sethu, Epps, Williamson, Quatieri, Krajewski (b0015) 2020; 11 Williamson, Quatieri, Helfer, Horwitz, Yu, Mehta (b0170) 2013 Senoussaoui, Sarria-Paja, Santos, Falk (b0130) 2014 Yang, Jiang, Sahli (b0105) 2018; 1 de Melo, Granger, Lopez (b0240) 2020 Ma, Yang, Chen, Huang, Wang (b0100) 2016 Pampouchidou, Pediaditis, Maridaki, Awais, Vazakopoulou, Sfakianakis, Tsiknakis, Simos, Marias, Yang (b0025) 2017; 2017 Y. Gong, C. Poellabauer, Topic modeling based multi-modal depression detection, in: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, ACM, 2017, pp. 69–76. doi:10.1145/3133944.3133945. Niu, Tao, Liu, Huang, Lian (b0225) 2020; 1 Bian, Chen, Xu (b0080) 2019; 368 Cummins, Scherer, Krajewski, Schnieder, Epps, Quatieri (b0065) 2015; 71 Yan, Zheng, Cui, Tang, Zhang, Zong (b0090) 2018; 309 Williamson (10.1016/j.neucom.2021.02.019_b0175) 2012; 25 Scherer (10.1016/j.neucom.2021.02.019_b0005) 2015; 7 10.1016/j.neucom.2021.02.019_b0055 Yang (10.1016/j.neucom.2021.02.019_b0105) 2018; 1 Song (10.1016/j.neucom.2021.02.019_b0260) 2020; 1 He (10.1016/j.neucom.2021.02.019_b0020) 2018; 83 Zhou (10.1016/j.neucom.2021.02.019_b0255) 2020; 11 Tang (10.1016/j.neucom.2021.02.019_b0095) 2018; 2018 Ma (10.1016/j.neucom.2021.02.019_b0135) 2016 Yan (10.1016/j.neucom.2021.02.019_b0090) 2018; 309 Mollahosseini (10.1016/j.neucom.2021.02.019_b0215) 2017; 10 He (10.1016/j.neucom.2021.02.019_b0140) 2015 Cummins (10.1016/j.neucom.2021.02.019_b0160) 2017 De Melo (10.1016/j.neucom.2021.02.019_b0035) 2019 10.1016/j.neucom.2021.02.019_b0220 Chao (10.1016/j.neucom.2021.02.019_b0030) 2015 Williamson (10.1016/j.neucom.2021.02.019_b0050) 2019; 55 Pampouchidou (10.1016/j.neucom.2021.02.019_b0145) 2016 Bian (10.1016/j.neucom.2021.02.019_b0080) 2019; 368 de Melo (10.1016/j.neucom.2021.02.019_b0245) 2020; 1 10.1016/j.neucom.2021.02.019_b0060 de Melo (10.1016/j.neucom.2021.02.019_b0240) 2020 10.1016/j.neucom.2021.02.019_b0155 10.1016/j.neucom.2021.02.019_b0230 10.1016/j.neucom.2021.02.019_b0110 10.1016/j.neucom.2021.02.019_b0075 10.1016/j.neucom.2021.02.019_b0150 Yang (10.1016/j.neucom.2021.02.019_b0120) 2016 Hao (10.1016/j.neucom.2021.02.019_b0085) 2020; 391 10.1016/j.neucom.2021.02.019_b0115 Senoussaoui (10.1016/j.neucom.2021.02.019_b0130) 2014 10.1016/j.neucom.2021.02.019_b0235 Williamson (10.1016/j.neucom.2021.02.019_b0170) 2013 Jan (10.1016/j.neucom.2021.02.019_b0045) 2017; 10 Al Jazaery (10.1016/j.neucom.2021.02.019_b0250) 2018; 1 Williamson (10.1016/j.neucom.2021.02.019_b0180) 2014 10.1016/j.neucom.2021.02.019_b0190 He (10.1016/j.neucom.2021.02.019_b0195) 2016 10.1016/j.neucom.2021.02.019_b0200 Pampouchidou (10.1016/j.neucom.2021.02.019_b0025) 2017; 2017 10.1016/j.neucom.2021.02.019_b0165 10.1016/j.neucom.2021.02.019_b0040 Cummins (10.1016/j.neucom.2021.02.019_b0065) 2015; 71 Williamson (10.1016/j.neucom.2021.02.019_b0185) 2016 Pérez Espinosa (10.1016/j.neucom.2021.02.019_b0210) 2014 Niu (10.1016/j.neucom.2021.02.019_b0225) 2020; 1 10.1016/j.neucom.2021.02.019_b0205 10.1016/j.neucom.2021.02.019_b0125 Pampouchidou (10.1016/j.neucom.2021.02.019_b0010) 2019; 10 Ma (10.1016/j.neucom.2021.02.019_b0100) 2016 Cummins (10.1016/j.neucom.2021.02.019_b0015) 2020; 11 Alghowinem (10.1016/j.neucom.2021.02.019_b0070) 2018; 9
References_xml	– volume: 1 start-page: 1 year: 2018 ident: b0105 article-title: Integrating deep and shallow models for multi-modal depression analysis-hybrid architectures publication-title: IEEE Trans. Affective Comput. – start-page: 770 year: 2016 end-page: 778 ident: b0195 article-title: Deep residual learning for image recognition publication-title: Proceedings of the IEEE conference on computer vision and pattern recognition – volume: 55 start-page: 40 year: 2019 end-page: 56 ident: b0050 article-title: Tracking depression severity from audio and video based on speech articulatory coordination publication-title: Computer Speech Language – start-page: 526 year: 2015 end-page: 531 ident: b0030 article-title: Multi task sequence learning for depression scale prediction from video publication-title: 2015 International Conference on Affective Computing and Intelligent Interaction (ACII) – start-page: 260 year: 2015 end-page: 266 ident: b0140 article-title: Multimodal depression recognition with dynamic visual and audio cues publication-title: 2015 International Conference on Affective Computing and Intelligent Interaction (ACII) – reference: F. Ringeval, B. Schuller, M. Valstar, J. Gratch, R. Cowie, S. Scherer, S. Mozgai, N. Cummins, M. Schmitt, M. Pantic, Avec 2017: Real-life depression, and affect recognition workshop and challenge, in: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, ACM, 2017, pp. 3–9. doi:10.1145/3133944.3133953. – start-page: 27 year: 2016 end-page: 34 ident: b0145 article-title: Depression assessment by fusing high and low level features from audio, video, and text, in publication-title: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge – volume: 9 start-page: 478 year: 2018 end-page: 490 ident: b0070 article-title: Multimodal depression detection: fusion analysis of paralinguistic, head pose and eye gaze behaviors publication-title: IEEE Trans. Affective Comput. – volume: 10 start-page: 445 year: 2019 end-page: 470 ident: b0010 article-title: Automatic assessment of depression based on visual cues: A systematic review publication-title: IEEE Trans. Affective Comput. – volume: 2017 start-page: 64 year: 2017 ident: b0025 article-title: Quantitative comparison of motion history image variants for video-based depression assessment publication-title: EURASIP J. Image Video Processing – reference: M. Valstar, B. Schuller, K. Smith, F. Eyben, B. Jiang, S. Bilakhia, S. Schnieder, R. Cowie, M. Pantic, Avec 2013: the continuous audio/visual emotion and depression recognition challenge, in: Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge, ACM, 2013, pp. 3–10. doi:10.1145/2512530.2512533. – reference: S. Chen, Q. Jin, J. Zhao, S. Wang, Multimodal multi-task learning for dimensional and continuous emotion recognition, in: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, ACM, 2017, pp. 19–26. doi:10.1145/3133944.3133949. – reference: S.A. Qureshi, S. Saha, M. Hasanuzzaman, G. Dias, E. Cambria, Multi-task representation learning for multimodal estimation of depression level, IEEE Intelligent Systems. – start-page: 65 year: 2014 end-page: 72 ident: b0180 article-title: Vocal and facial biomarkers of depression based on motor incoordination and timing, in publication-title: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge – volume: 71 start-page: 10 year: 2015 end-page: 49 ident: b0065 article-title: A review of depression and suicide risk assessment using speech analysis publication-title: Speech Commun. – volume: 83 start-page: 103 year: 2018 end-page: 111 ident: b0020 article-title: Automated depression analysis using convolutional neural networks from speech publication-title: J. Biomed. Inform. – volume: 391 start-page: 42 year: 2020 end-page: 51 ident: b0085 article-title: Visual-audio emotion recognition based on multi-task and ensemble learning with multiple features publication-title: Neurocomputing – reference: D. Siegmund, L. Chiesa, O. Hörr, F. Gabler, A. Braun, A. Kuijper, Talis-a design study for a wearable device to assist people with depression, in: 2017 IEEE 41st Annual Computer Software and Applications Conference (COMPSAC), Vol. 2, IEEE, 2017, pp. 543–548. doi:10.1109/COMPSAC.2017.228. – reference: A. Pampouchidou, O. Simantiraki, C.-M. Vazakopoulou, C. Chatzaki, M. Pediaditis, A. Maridaki, K. Marias, P. Simos, F. Yang, F. Meriaudeau, et al., Facial geometry and speech analysis for depression detection, in: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, 2017, pp. 1433–1436. doi:10.1109/EMBC.2017.8037103. – volume: 1 start-page: 1 year: 2020 end-page: 16 ident: b0245 article-title: A deep multiscale spatiotemporal network for assessing depression from facial dynamics publication-title: IEEE Trans. Affective Comput. – reference: S. Alghowinem, Multimodal analysis of verbal and nonverbal behaviour on the example of clinical depression, Ph.D. thesis. The Australian National University. – start-page: 49 year: 2014 end-page: 55 ident: b0210 article-title: Fusing affective dimensions and audio-visual features from segmented video for depression recognition: Inaoe-buap’s participation at avec’14 challenge, in publication-title: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge – start-page: 338 year: 2016 end-page: 351 ident: b0135 article-title: Cost-sensitive two-stage depression prediction using dynamic visual clues, in publication-title: Asian Conference on Computer Vision – volume: 10 start-page: 18 year: 2017 end-page: 31 ident: b0215 article-title: Affectnet: A database for facial expression, valence, and arousal computing in the wild publication-title: IEEE Trans. Affective Comput. – volume: 11 start-page: 272 year: 2020 end-page: 283 ident: b0015 article-title: Generalized two-stage rank regression framework for depression score prediction from speech publication-title: IEEE Trans. Affective Comput. – start-page: 41 year: 2013 end-page: 48 ident: b0170 article-title: Vocal biomarkers of depression based on motor incoordination, in publication-title: Proceedings of the 3rd ACM international workshop on Audio/visual emotion challenge – reference: M.R. Morales, Multimodal depression detection: an investigation of features and fusion techniques for automated systems, Ph.D. thesis. City University of New York. – volume: 1 start-page: 1 year: 2020 ident: b0260 article-title: Spectral representation of behaviour primitives for depression analysis publication-title: IEEE Trans. Affective Comput. – volume: 7 start-page: 59 year: 2015 end-page: 73 ident: b0005 article-title: Self-reported symptoms of depression and ptsd are associated with reduced vowel space in screening interviews publication-title: IEEE Trans. Affective Comput. – start-page: 57 year: 2014 end-page: 63 ident: b0130 article-title: Model fusion for multimodal depression classification and level detection, in publication-title: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge – start-page: 89 year: 2016 end-page: 96 ident: b0120 article-title: Decision tree based depression classification from audio video and language information, in publication-title: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge – reference: M. Valstar, B. Schuller, K. Smith, T. Almaev, F. Eyben, J. Krajewski, R. Cowie, M. Pantic, Avec 2014: 3d dimensional affect and depression recognition challenge, in: Proceedings of the 4th International Workshop on Audio/Visual Emotion Challenge, ACM, 2014, pp. 3–10. doi:10.1145/2661806.2661807. – reference: S. Harati, A. Crowell, H. Mayberg, S. Nemati, Depression severity classification from speech emotion, in: 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), IEEE, 2018, pp. 5763–5766. doi:10.1109/EMBC.2018.8513610. – reference: M. Niu, J. Tao, B. Liu, C. Fan, Automatic depression level detection via lp-norm pooling, in: Proc. Interspeech 2019, 2019, pp. 4559–4563. doi:10.21437/Interspeech.2019-1617. – reference: W.C. de Melo, E. Granger, A. Hadid, Combining global and local convolutional 3d networks for detecting depression from facial expressions, in: 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019), IEEE, 2019, pp. 1–8. doi:10.1109/FG.2019.8756568. – start-page: 1080 year: 2020 end-page: 1084 ident: b0240 article-title: Encoding temporal information for automatic depression recognition from facial analysis, in publication-title: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) – volume: 10 start-page: 668 year: 2017 end-page: 680 ident: b0045 article-title: Artificial intelligent system for automatic depression level analysis through visual and vocal expressions publication-title: IEEE Trans. Cognitive Dev. Syst. – start-page: 4544 year: 2019 end-page: 4548 ident: b0035 article-title: Depression detection based on deep distribution learning publication-title: 2019 IEEE International Conference on Image Processing (ICIP) – volume: 25 start-page: 230 year: 2012 end-page: 238 ident: b0175 article-title: Seizure prediction using eeg spatiotemporal correlation structure publication-title: Epilepsy Behavior – volume: 1 start-page: 1 year: 2020 ident: b0225 article-title: Multimodal spatiotemporal representation for automatic depression level detection publication-title: IEEE Trans. Affective Comput. – reference: C. Li, X. Ma, B. Jiang, X. Li, X. Zhang, X. Liu, Y. Cao, A. Kannan, Z. Zhu, Deep speaker: an end-to-end neural speaker embedding system, arXiv preprint arXiv:1705.02304. – reference: Z. Zhao, Q. Li, N. Cummins, B. Liu, H. Wang, J. Tao, B.W. Schuller, Hybrid network feature extraction for depression assessment from speech, in: Proc. Interspeech 2020, 2020, pp. 4956–4960. doi:10.21437/Interspeech.2020-2396. – reference: M. Valstar, J. Gratch, B. Schuller, F. Ringeval, D. Lalanne, M. Torres Torres, S. Scherer, G. Stratou, R. Cowie, M. Pantic, Avec 2016: Depression, mood, and emotion recognition workshop and challenge, in: Proceedings of the 6th international workshop on audio/visual emotion challenge, ACM, 2016, pp. 3–10. doi:10.1145/2988257.2988258. – reference: Y. Gong, C. Poellabauer, Topic modeling based multi-modal depression detection, in: Proceedings of the 7th Annual Workshop on Audio/Visual Emotion Challenge, ACM, 2017, pp. 69–76. doi:10.1145/3133944.3133945. – start-page: 11 year: 2016 end-page: 18 ident: b0185 article-title: Detecting depression using vocal, facial and semantic communication cues, in publication-title: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge – volume: 368 start-page: 59 year: 2019 end-page: 68 ident: b0080 article-title: Self-attention based speaker recognition using cluster-range loss publication-title: Neurocomputing – start-page: 35 year: 2016 end-page: 42 ident: b0100 article-title: Depaudionet: An efficient deep model for audio based depression classification, in publication-title: Proceedings of the 6th International Workshop on Audio/Visual Emotion Challenge – volume: 1 start-page: 1 year: 2018 end-page: 8 ident: b0250 article-title: Video-based depression level analysis by encoding deep spatiotemporal features publication-title: IEEE Trans. Affective Comput. – volume: 2018 start-page: 162 year: 2018 end-page: 166 ident: b0095 article-title: An end-to-end deep learning framework with speech emotion recognition of atypical individuals publication-title: Proc. Interspeech – start-page: 478 year: 2017 end-page: 484 ident: b0160 article-title: An image-based deep spectrum feature representation for the recognition of emotional speech, in publication-title: Proceedings of the 25th ACM international conference on Multimedia – volume: 11 start-page: 542 year: 2020 end-page: 552 ident: b0255 article-title: Visually interpretable representation learning for depression recognition from facial images publication-title: IEEE Trans. Affective Comput. – volume: 309 start-page: 27 year: 2018 end-page: 35 ident: b0090 article-title: Multi-cue fusion for emotion recognition in the wild publication-title: Neurocomputing – volume: 71 start-page: 10 year: 2015 ident: 10.1016/j.neucom.2021.02.019_b0065 article-title: A review of depression and suicide risk assessment using speech analysis publication-title: Speech Commun. doi: 10.1016/j.specom.2015.03.004 – volume: 368 start-page: 59 year: 2019 ident: 10.1016/j.neucom.2021.02.019_b0080 article-title: Self-attention based speaker recognition using cluster-range loss publication-title: Neurocomputing doi: 10.1016/j.neucom.2019.08.046 – volume: 1 start-page: 1 year: 2020 ident: 10.1016/j.neucom.2021.02.019_b0225 article-title: Multimodal spatiotemporal representation for automatic depression level detection publication-title: IEEE Trans. Affective Comput. – ident: 10.1016/j.neucom.2021.02.019_b0060 doi: 10.1145/3133944.3133953 – ident: 10.1016/j.neucom.2021.02.019_b0220 doi: 10.21437/Interspeech.2019-1617 – ident: 10.1016/j.neucom.2021.02.019_b0205 doi: 10.1109/EMBC.2018.8513610 – volume: 10 start-page: 18 issue: 1 year: 2017 ident: 10.1016/j.neucom.2021.02.019_b0215 article-title: Affectnet: A database for facial expression, valence, and arousal computing in the wild publication-title: IEEE Trans. Affective Comput. doi: 10.1109/TAFFC.2017.2740923 – ident: 10.1016/j.neucom.2021.02.019_b0115 doi: 10.1145/2661806.2661807 – volume: 10 start-page: 668 issue: 3 year: 2017 ident: 10.1016/j.neucom.2021.02.019_b0045 article-title: Artificial intelligent system for automatic depression level analysis through visual and vocal expressions publication-title: IEEE Trans. Cognitive Dev. Syst. doi: 10.1109/TCDS.2017.2721552 – volume: 1 start-page: 1 year: 2018 ident: 10.1016/j.neucom.2021.02.019_b0250 article-title: Video-based depression level analysis by encoding deep spatiotemporal features publication-title: IEEE Trans. Affective Comput. – volume: 2017 start-page: 64 issue: 1 year: 2017 ident: 10.1016/j.neucom.2021.02.019_b0025 article-title: Quantitative comparison of motion history image variants for video-based depression assessment publication-title: EURASIP J. Image Video Processing doi: 10.1186/s13640-017-0212-3 – ident: 10.1016/j.neucom.2021.02.019_b0165 doi: 10.1145/3133944.3133949 – ident: 10.1016/j.neucom.2021.02.019_b0075 – start-page: 41 year: 2013 ident: 10.1016/j.neucom.2021.02.019_b0170 article-title: Vocal biomarkers of depression based on motor incoordination, in – ident: 10.1016/j.neucom.2021.02.019_b0110 doi: 10.1145/2512530.2512533 – volume: 1 start-page: 1 year: 2020 ident: 10.1016/j.neucom.2021.02.019_b0260 article-title: Spectral representation of behaviour primitives for depression analysis publication-title: IEEE Trans. Affective Comput. – start-page: 65 year: 2014 ident: 10.1016/j.neucom.2021.02.019_b0180 article-title: Vocal and facial biomarkers of depression based on motor incoordination and timing, in – start-page: 526 year: 2015 ident: 10.1016/j.neucom.2021.02.019_b0030 article-title: Multi task sequence learning for depression scale prediction from video – start-page: 1080 year: 2020 ident: 10.1016/j.neucom.2021.02.019_b0240 article-title: Encoding temporal information for automatic depression recognition from facial analysis, in – volume: 25 start-page: 230 issue: 2 year: 2012 ident: 10.1016/j.neucom.2021.02.019_b0175 article-title: Seizure prediction using eeg spatiotemporal correlation structure publication-title: Epilepsy Behavior doi: 10.1016/j.yebeh.2012.07.007 – volume: 391 start-page: 42 year: 2020 ident: 10.1016/j.neucom.2021.02.019_b0085 article-title: Visual-audio emotion recognition based on multi-task and ensemble learning with multiple features publication-title: Neurocomputing doi: 10.1016/j.neucom.2020.01.048 – volume: 1 start-page: 1 year: 2018 ident: 10.1016/j.neucom.2021.02.019_b0105 article-title: Integrating deep and shallow models for multi-modal depression analysis-hybrid architectures publication-title: IEEE Trans. Affective Comput. – start-page: 478 year: 2017 ident: 10.1016/j.neucom.2021.02.019_b0160 article-title: An image-based deep spectrum feature representation for the recognition of emotional speech, in – start-page: 35 year: 2016 ident: 10.1016/j.neucom.2021.02.019_b0100 article-title: Depaudionet: An efficient deep model for audio based depression classification, in – volume: 83 start-page: 103 year: 2018 ident: 10.1016/j.neucom.2021.02.019_b0020 article-title: Automated depression analysis using convolutional neural networks from speech publication-title: J. Biomed. Inform. doi: 10.1016/j.jbi.2018.05.007 – ident: 10.1016/j.neucom.2021.02.019_b0125 doi: 10.1145/3133944.3133945 – ident: 10.1016/j.neucom.2021.02.019_b0150 – ident: 10.1016/j.neucom.2021.02.019_b0230 doi: 10.21437/Interspeech.2020-2396 – start-page: 57 year: 2014 ident: 10.1016/j.neucom.2021.02.019_b0130 article-title: Model fusion for multimodal depression classification and level detection, in – volume: 9 start-page: 478 issue: 4 year: 2018 ident: 10.1016/j.neucom.2021.02.019_b0070 article-title: Multimodal depression detection: fusion analysis of paralinguistic, head pose and eye gaze behaviors publication-title: IEEE Trans. Affective Comput. doi: 10.1109/TAFFC.2016.2634527 – volume: 309 start-page: 27 year: 2018 ident: 10.1016/j.neucom.2021.02.019_b0090 article-title: Multi-cue fusion for emotion recognition in the wild publication-title: Neurocomputing doi: 10.1016/j.neucom.2018.03.068 – ident: 10.1016/j.neucom.2021.02.019_b0055 doi: 10.1145/2988257.2988258 – ident: 10.1016/j.neucom.2021.02.019_b0200 doi: 10.1109/COMPSAC.2017.228 – start-page: 338 year: 2016 ident: 10.1016/j.neucom.2021.02.019_b0135 article-title: Cost-sensitive two-stage depression prediction using dynamic visual clues, in – ident: 10.1016/j.neucom.2021.02.019_b0235 doi: 10.1109/FG.2019.8756568 – start-page: 89 year: 2016 ident: 10.1016/j.neucom.2021.02.019_b0120 article-title: Decision tree based depression classification from audio video and language information, in – start-page: 770 year: 2016 ident: 10.1016/j.neucom.2021.02.019_b0195 article-title: Deep residual learning for image recognition – start-page: 4544 year: 2019 ident: 10.1016/j.neucom.2021.02.019_b0035 article-title: Depression detection based on deep distribution learning – ident: 10.1016/j.neucom.2021.02.019_b0040 doi: 10.1109/EMBC.2017.8037103 – start-page: 27 year: 2016 ident: 10.1016/j.neucom.2021.02.019_b0145 article-title: Depression assessment by fusing high and low level features from audio, video, and text, in – volume: 2018 start-page: 162 year: 2018 ident: 10.1016/j.neucom.2021.02.019_b0095 article-title: An end-to-end deep learning framework with speech emotion recognition of atypical individuals publication-title: Proc. Interspeech – start-page: 11 year: 2016 ident: 10.1016/j.neucom.2021.02.019_b0185 article-title: Detecting depression using vocal, facial and semantic communication cues, in – volume: 11 start-page: 272 issue: 2 year: 2020 ident: 10.1016/j.neucom.2021.02.019_b0015 article-title: Generalized two-stage rank regression framework for depression score prediction from speech publication-title: IEEE Trans. Affective Comput. doi: 10.1109/TAFFC.2017.2766145 – ident: 10.1016/j.neucom.2021.02.019_b0190 – ident: 10.1016/j.neucom.2021.02.019_b0155 – volume: 7 start-page: 59 issue: 1 year: 2015 ident: 10.1016/j.neucom.2021.02.019_b0005 article-title: Self-reported symptoms of depression and ptsd are associated with reduced vowel space in screening interviews publication-title: IEEE Trans. Affective Comput. doi: 10.1109/TAFFC.2015.2440264 – volume: 11 start-page: 542 issue: 3 year: 2020 ident: 10.1016/j.neucom.2021.02.019_b0255 article-title: Visually interpretable representation learning for depression recognition from facial images publication-title: IEEE Trans. Affective Comput. doi: 10.1109/TAFFC.2018.2828819 – start-page: 260 year: 2015 ident: 10.1016/j.neucom.2021.02.019_b0140 article-title: Multimodal depression recognition with dynamic visual and audio cues – volume: 55 start-page: 40 year: 2019 ident: 10.1016/j.neucom.2021.02.019_b0050 article-title: Tracking depression severity from audio and video based on speech articulatory coordination publication-title: Computer Speech Language doi: 10.1016/j.csl.2018.08.004 – volume: 1 start-page: 1 year: 2020 ident: 10.1016/j.neucom.2021.02.019_b0245 article-title: A deep multiscale spatiotemporal network for assessing depression from facial dynamics publication-title: IEEE Trans. Affective Comput. – volume: 10 start-page: 445 issue: 4 year: 2019 ident: 10.1016/j.neucom.2021.02.019_b0010 article-title: Automatic assessment of depression based on visual cues: A systematic review publication-title: IEEE Trans. Affective Comput. doi: 10.1109/TAFFC.2017.2724035 – start-page: 49 year: 2014 ident: 10.1016/j.neucom.2021.02.019_b0210 article-title: Fusing affective dimensions and audio-visual features from segmented video for depression recognition: Inaoe-buap’s participation at avec’14 challenge, in
SSID	ssj0017129
Score	2.5195625
Snippet	Effective and efficient automatic depression diagnosis is a challenging subject in the field of affective computing. Since speech signals provide useful...
SourceID	crossref elsevier
SourceType	Enrichment Source Index Database Publisher
StartPage	279
SubjectTerms	Depression detection Feature variation coordination Hierarchical model Pretrained model
Title	A hierarchical depression detection model based on vocal and emotional cues
URI	https://dx.doi.org/10.1016/j.neucom.2021.02.019
Volume	441
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NS8MwFH-MefHitzg_Rg5e45q0TbvjGI7pcBcd7BbaJIWJ1CGdR_9230vboSAK3pqQQHlJ30fzy-8HcB0UQpgsLLi1JuNRUhQ8K5TkuNg5pdAidfRD_2GupovofhkvOzBu78IQrLLx_bVP99666Rk01hysV6vBYzCUWEUJKYlFVMXECRpFCe3ym48tzEMkQtZ8ezLmNLq9PucxXqXbEGZEYqDzzJ3Et_NTePoSciYHsNfkimxUv84hdFx5BPutDgNrPstjmI0YKVr7MwE0OduiW0t8rDzWqmRe8oZR0LIMm-8Uw1hWWuZqIR9sGQwRJ7CY3D6Np7wRSeAGs_2KWxOnBssCmRqZCakSZ01iTGhJQAx7lHRDTApdWpAikVWhzbDEiXB5cieVDcJT6JavpTsDhtmbyqOhsrEsME_LcwxVsbU2NYHIsS7pQdjaRpuGQZyELF50CxV71rVFNVlUB1KjRXvAt7PWNYPGH-OT1uz6207Q6OR_nXn-75kXsEstgoBJcQnd6m3jrjDZqPK-30192BndzabzT7A21Aw
linkProvider	Elsevier
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV07T8MwED6VMsDCG1GeHlhNYydx0rGqqAp9LLRSNyuxHakIhQq1jPx27vKoQEIgscWJLUVn--675PN9ALdeJoRJ_IxbaxIeRFnGk0xJjpOdEoQWsaMP-uOJGsyCx3k4b0CvPgtDtMrK95c-vfDW1Z12Zc32crFoP3kdiVmUkJKqiKow2ILtALcvyRjcfWx4HiISsiy4J0NO3evzcwXJK3drIo1IjHRF6U4quPNTfPoSc_oHsFeBRdYt3-cQGi4_gv1aiIFV-_IYhl1GktbFTwG0OdvQW3O8XBVkq5wVmjeMopZl2HynIMaS3DJXKvlgy2CMOIFZ_37aG_BKJYEbhPsrbk0YG8wLZGxkIqSKnDWRMb4lBTG8o6TrICp0cUaSRFb5NsEcJ8D5SZ1U1vNPoZm_5u4MGMI3lQYdZUOZIVBLU4xVobU2Np5IMTFpgV_bRpuqhDgpWbzomiv2rEuLarKo9qRGi7aAb0YtyxIaf_SParPrb0tBo5f_deT5v0fewM5gOh7p0cNkeAG79IT4YFJcQnP1tnZXiDxW6XWxsj4BsU7Vmg
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=A+hierarchical+depression+detection+model+based+on+vocal+and+emotional+cues&rft.jtitle=Neurocomputing+%28Amsterdam%29&rft.au=Dong%2C+Yizhuo&rft.au=Yang%2C+Xinyu&rft.date=2021-06-21&rft.issn=0925-2312&rft.volume=441&rft.spage=279&rft.epage=290&rft_id=info:doi/10.1016%2Fj.neucom.2021.02.019&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_neucom_2021_02_019
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0925-2312&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0925-2312&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0925-2312&client=summon