Emotion Recognition from rPPG via Physiologically Inspired Temporal Encoding and Attention-Based Curriculum Learning
Remote photoplethysmography (rPPG) enables non-contact physiological measurement for emotion recognition, yet the temporally sparse nature of emotional cardiovascular responses, intrinsic measurement noise, weak session-level labels, and subtle correlates of valence pose critical challenges. To addr...
Saved in:
Published in | Sensors (Basel, Switzerland) Vol. 25; no. 13; p. 3995 |
---|---|
Main Authors | , , |
Format | Journal Article |
Language | English |
Published |
Switzerland
MDPI AG
26.06.2025
MDPI |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Remote photoplethysmography (rPPG) enables non-contact physiological measurement for emotion recognition, yet the temporally sparse nature of emotional cardiovascular responses, intrinsic measurement noise, weak session-level labels, and subtle correlates of valence pose critical challenges. To address these issues, we propose a physiologically inspired deep learning framework comprising a Multi-scale Temporal Dynamics Encoder (MTDE) to capture autonomic nervous system dynamics across multiple timescales, an adaptive sparse α-Entmax attention mechanism to identify salient emotional segments amidst noisy signals, Gated Temporal Pooling for the robust aggregation of emotional features, and a structured three-phase curriculum learning strategy to systematically handle temporal sparsity, weak labels, and noise. Evaluated on the MAHNOB-HCI dataset (27 subjects and 527 sessions with a subject-mixed split), our temporal-only model achieved competitive performance in arousal recognition (66.04% accuracy; 61.97% weighted F1-score), surpassing prior CNN-LSTM baselines. However, lower performance in valence (62.26% accuracy) revealed inherent physiological limitations regarding a unimodal temporal cardiovascular analysis. These findings establish clear benchmarks for temporal-only rPPG emotion recognition and underscore the necessity of incorporating spatial or multimodal information to effectively capture nuanced emotional dimensions such as valence, guiding future research directions in affective computing. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
ISSN: | 1424-8220 1424-8220 |
DOI: | 10.3390/s25133995 |