LSCAformer: Long and short-term cross-attention-aware transformer for depression recognition from video sequences
Depression will be the first prevalent mental disorder to result in the negative impact on individuals and society globally by 2030. Artificial intelligence (AI) algorithms have the potentials to significantly advance depression treatment. Existing deep learning-based architectures for the automatic...
Saved in:
Published in | Biomedical signal processing and control Vol. 98; p. 106767 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.12.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Depression will be the first prevalent mental disorder to result in the negative impact on individuals and society globally by 2030. Artificial intelligence (AI) algorithms have the potentials to significantly advance depression treatment. Existing deep learning-based architectures for the automatic diagnosis of a patient depression severity have the two primary challenges: (1) How to effectively learn both long-term and short-term patterns of depression? (2) How to efficiently merge long-term and short-term depressive features to achieve extended predictions from facial videos? To mitigate these challenges, a novel long short-term cross-attention-aware Transformer (LSCAformer) that is engineered for video-based depression recognition. Within LSCAformer, two architectures are introduced, i.e., a long short-term feature extraction (LSTFE) and a cross-attention-aware Transformer. Initially, LSTFE employs two separate branches to capture depression behaviors across long and short-term intervals. Subsequently, cross-attention-aware Transformer is implemented to identify complementary patterns within both long-term and short-term features, employing temporal-directed attention (TDA) to discern complementary temporal patterns across the long and short duration branches. On the AVEC2013/AVEC2014, the LSCAformer demonstrated superior performances with a root mean square error (RMSE), a mean absolute error (MAE) and a concordance correlation coefficient (CCC) of 7.69/5.89/0.868 and 7.55/5.91/0.845, respectively. Additionally, cross dataset experiments are performed to valid the generalization of the LSCAformer with a RMSE of 7.21, a MAE of 5.63, and a CCC of 0.874 (AVEC2013 for training, and the Northwind task of AVEC2014 for testing). Moreover, the proposed method can model the complementary behavioral patterns between long-term and short-term sequences for depression recognition. Code will be available at: https://github.com/helang818/LSCAformer/.
•A novel end-to-end LSCAformer for depression recognition is proposed.•A long and short-term feature extraction (LSTFE) module is designed.•Help the clinicians when diagnosing depressed subjects. |
---|---|
ISSN: | 1746-8094 1746-8108 |
DOI: | 10.1016/j.bspc.2024.106767 |