Emotional stimulated speech-based assisted early diagnosis of depressive disorders using personality-enhanced deep learning

Early diagnosis of depression is crucial, and speech-based early diagnosis of depression is promising, but insufficient data and lack of theoretical support make it difficult to be applied. Therefore, it is valuable to combine psychiatric theories, collect speech recognition data for depression, and...

Full description

Saved in:
Bibliographic Details
Published inJournal of affective disorders Vol. 376; pp. 177 - 188
Main Authors Ding, Zhong, Chen, Jing, Zhong, Bao-Liang, Liu, Chen-Ling, Liu, Zhen-Tao
Format Journal Article
LanguageEnglish
Published Netherlands Elsevier B.V 01.05.2025
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Early diagnosis of depression is crucial, and speech-based early diagnosis of depression is promising, but insufficient data and lack of theoretical support make it difficult to be applied. Therefore, it is valuable to combine psychiatric theories, collect speech recognition data for depression, and develop a practicable recognition method for depression. In this study, 24 patients with major depressive disorders (MDDs) and 36 healthy controls (HCs) were recruited to participate in a multi-task speech experiment. Descriptive statistics and tests of variance were used to analyze subjects' personality and speech changes. Subsequently, the speech task with the most depressive cues was explored using the Bidirectional Long - Short Term Memory (Bi-LSTM) algorithm, on which a personality-assisted multitasking deep model, i.e., multi-task attentional temporal convolutional network model (TCN-MTA). Statistical analyses of speech duration showed that the fable reading, neutral stimulus, and negative stimulus tasks had significant differences on subjects' speech duration, and the negative stimulus task had significant differences between the depressed and control groups (p < 0.001, 0.03, 0.04). Notably, the Big Five personality emotional stability scores were significantly different between the depressed and control groups (0.03). Depression was best identified using Bi-LSTM in negative (Youden index = 0.44) and positive stimulus speech (Youden index = 0.42). Further, the specificity of 0.72 and sensitivity of 0.87 for recognizing depression in negative stimulus speech using our proposed TCN-MTA outperforms existing methods. The sample size enrolled in this study is higher than the minimum sample size calculated through G-Power 3.1, but the sample size in this study is still small. The proposed deep learning-based personality-assisted multitasking method could accurately recognize major depression, which demonstrated the potential of the method based on the fusion of specialized theories and artificial intelligence. [Display omitted] •This study built a personality depression dataset with speech on structured, semi-structured and spontaneous tasks.•HCs was faster than DDs on the read-aloud task, but took longer than DDs on the other tasks.•Depression recognition was best based on spontaneous speech and worst based on structured speech.•The personality-assisted early recognition model of depression (TCN-MTA) was effective (specificity = 0.72, sensitivity = 0.87).
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:0165-0327
1573-2517
1573-2517
DOI:10.1016/j.jad.2025.01.136