MFE-Former: Disentangling Emotion-Identity Dynamics via Self-Supervised Learning for Enhancing Speech-Driven Depression Detection
Acoustic features are crucial behavioral indicators for depression detection. However, prior speech-based depression detection methods often overlook the variability of emotional patterns across samples, leading to interference from speaker identity and hindering the effective extraction of emotiona...
Saved in:
Published in | IEEE journal of biomedical and health informatics Vol. PP; pp. 1 - 12 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
United States
IEEE
01.08.2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Acoustic features are crucial behavioral indicators for depression detection. However, prior speech-based depression detection methods often overlook the variability of emotional patterns across samples, leading to interference from speaker identity and hindering the effective extraction of emotional changes. To address this limitation, we developed the Emotional Word Reading Experiment (EWRE) and introduced a method combining self-supervised and supervised learning for depression detection from speech called MFE-Former. First, we generate fine-grained emotional representations for response segments by computing cosine similarity between intra-sample and inter-sample contexts. Concurrently, orthogonality constraints decouple identity information from emotional features, while a Transformer decoder reconstructs spectral structures to improve sensitivity to depression-related emotional patterns. Next, we propose a multi-scale emotion change perception module and a Bernoulli distribution-based joint decision module integrate multi-level information for depression detection. By enhancing the distribution differences among positive, neutral, and negative emotional features, we find that patients with depression are more inclined to express negative emotions, whereas healthy individuals express more positive emotions. The experimental results on EWRE and AVEC 2014 show that MFE-Former outperforms state-of-the-art temporal methods under conditions of variability in emotional patterns across samples. MFE-Former has been open sourced on https://github.com/QLUTEmoTechCrew/MFE-Former . |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 2168-2194 2168-2208 2168-2208 |
DOI: | 10.1109/JBHI.2025.3594166 |