Robust Multimodal Fusion for Human Activity Recognition
The proliferation of IoT and mobile devices equipped with heterogeneous sensors has enabled new applications that rely on the fusion of time-series data generated by multiple sensors with different modalities. While there are promising deep neural network architectures for multimodal fusion, their p...
Saved in:
Main Authors | , , |
---|---|
Format | Journal Article |
Language | English |
Published |
08.03.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The proliferation of IoT and mobile devices equipped with heterogeneous
sensors has enabled new applications that rely on the fusion of time-series
data generated by multiple sensors with different modalities. While there are
promising deep neural network architectures for multimodal fusion, their
performance falls apart quickly in the presence of consecutive missing data and
noise across multiple modalities/sensors, the issues that are prevalent in
real-world settings. We propose Centaur, a multimodal fusion model for human
activity recognition (HAR) that is robust to these data quality issues. Centaur
combines a data cleaning module, which is a denoising autoencoder with
convolutional layers, and a multimodal fusion module, which is a deep
convolutional neural network with the self-attention mechanism to capture
cross-sensor correlation. We train Centaur using a stochastic data corruption
scheme and evaluate it on three datasets that contain data generated by
multiple inertial measurement units. Centaur's data cleaning module outperforms
2 state-of-the-art autoencoder-based models and its multimodal fusion module
outperforms 4 strong baselines. Compared to 2 related robust fusion
architectures, Centaur is more robust, achieving 11.59-17.52% higher accuracy
in HAR, especially in the presence of consecutive missing data in multiple
sensor channels. |
---|---|
DOI: | 10.48550/arxiv.2303.04636 |