Examining the Fourier Spectrum of Speech Signal from a Time-Frequency Perspective for Automatic Depression Level Prediction
Currently, many studies use Fourier amplitude spectra of speech signals to predict depression levels. However, those works often treat Fourier amplitude spectra as images or sequences to capture depression cues using convolutional neural networks or multilayer perceptrons. Therefore, they ignore the...
Saved in:
Published in | IEEE transactions on affective computing pp. 1 - 14 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
IEEE
2025
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Currently, many studies use Fourier amplitude spectra of speech signals to predict depression levels. However, those works often treat Fourier amplitude spectra as images or sequences to capture depression cues using convolutional neural networks or multilayer perceptrons. Therefore, they ignore the complex element composition and time-frequency attributes of Fourier spectra, which is not conducive to capturing the differences among individuals with different depression levels. For this reason, we construct a Time-Frequency Self-Embedding (TFSE) module, which not only stores the correlation relationship among real (imaginary) parts of Fourier spectra of different subjects from the time-frequency perspective, but also maintain the physical properties of data through the weight embedding process. Besides, Global Average Pooling (GAP) or linear layers are difficult to balance both temporal and frequency dimensions in the vectorization process. Therefore, we construct a Time-Frequency Tensor Vectorization (TFTV) module, which summarizes each channel along time and frequency dimensions, and then generates the vectorization result by integrating various channels. In this way, we combine TFSE and TFTV modules to form our SpectrumFormer model for predicting depression levels. Evaluation indicators on AVEC 2013 and AVEC 2014 depression databases imply the progressiveness of our model. |
---|---|
ISSN: | 1949-3045 1949-3045 |
DOI: | 10.1109/TAFFC.2025.3565654 |