Transportation Mode Recognition Fusing Wearable Motion, Sound, and Vision Sensors

We present the first work that investigates the potential of improving the performance of transportation mode recognition through fusing multimodal data from wearable sensors: motion, sound and vision. We first train three independent deep neural network (DNN) classifiers, which work with the three...

Full description

Saved in:

Bibliographic Details
Published in	IEEE sensors journal Vol. 20; no. 16; pp. 9314 - 9328
Main Authors	Richoz, Sebastien, Wang, Lin, Birch, Philip, Roggen, Daniel
Format	Journal Article
Language	English
Published	New York IEEE 15.08.2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Artificial neural networks Automobiles Cameras Classifiers data fusion Data integration Decision trees Global Positioning System Human activity recognition Locomotion Machine learning mobile sensing Neural networks Post-processing Public transportation Recognition Sensors Sound Transportation transportation mode recognition Vision wearable computing Wearable technology
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We present the first work that investigates the potential of improving the performance of transportation mode recognition through fusing multimodal data from wearable sensors: motion, sound and vision. We first train three independent deep neural network (DNN) classifiers, which work with the three types of sensors, respectively. We then propose two schemes that fuse the classification results from the three mono-modal classifiers. The first scheme makes an ensemble decision with fixed rules including Sum, Product, Majority Voting, and Borda Count. The second scheme is an adaptive fuser built as another classifier (including Naive Bayes, Decision Tree, Random Forest and Neural Network) that learns enhanced predictions by combining the outputs from the three mono-modal classifiers. We verify the advantage of the proposed method with the state-of-the-art Sussex-Huawei Locomotion and Transportation (SHL) dataset recognizing the eight transportation activities: Still, Walk, Run, Bike, Bus, Car, Train and Subway. We achieve F1 scores of 79.4%, 82.1% and 72.8% with the mono-modal motion, sound and vision classifiers, respectively. The F1 score is remarkably improved to 94.5% and 95.5% by the two data fusion schemes, respectively. The recognition performance can be further improved with a post-processing scheme that exploits the temporal continuity of transportation. When assessing generalization of the model to unseen data, we show that while performance is reduced - as expected - for each individual classifier, the benefits of fusion are retained with performance improved by 15 percentage points. Besides the actual performance increase, this work, most importantly, opens up the possibility for dynamically fusing modalities to achieve distinct power-performance trade-off at run time.
ISSN:	1530-437X 1558-1748
DOI:	10.1109/JSEN.2020.2987306