MMAct: A Large-Scale Dataset for Cross Modal Human Action Understanding

Unlike vision modalities, body-worn sensors or passive sensing can avoid the failure of action understanding in vision related challenges, e.g. occlusion and appearance variation. However, a standard large-scale dataset does not exist, in which different types of modalities across vision and sensors...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings / IEEE International Conference on Computer Vision pp. 8657 - 8666
Main Authors	Kong, Quan, Wu, Ziming, Deng, Ziwei, Klinkigt, Martin, Tong, Bin, Murakami, Tomokazu
Format	Conference Proceeding
Language	English
Published	IEEE 01.10.2019
Subjects	Adaptation models Cameras Gyroscopes Sensors Task analysis Three-dimensional displays Videos
Online Access	Get full text

Cover

Loading…

Abstract	Unlike vision modalities, body-worn sensors or passive sensing can avoid the failure of action understanding in vision related challenges, e.g. occlusion and appearance variation. However, a standard large-scale dataset does not exist, in which different types of modalities across vision and sensors are integrated. To address the disadvantage of vision-based modalities and push towards multi/cross modal action understanding, this paper introduces a new large-scale dataset recorded from 20 distinct subjects with seven different types of modalities: RGB videos, keypoints, acceleration, gyroscope, orientation, Wi-Fi and pressure signal. The dataset consists of more than 36k video clips for 37 action classes covering a wide range of daily life activities such as desktop-related and check-in-based ones in four different distinct scenarios. On the basis of our dataset, we propose a novel multi modality distillation model with attention mechanism to realize an adaptive knowledge transfer from sensor-based modalities to vision-based modalities. The proposed model significantly improves performance of action recognition compared to models trained with only RGB information. The experimental results confirm the effectiveness of our model on cross-subject, -view, -scene and -session evaluation criteria. We believe that this new large-scale multimodal dataset will contribute the community of multimodal based action understanding.
AbstractList	Unlike vision modalities, body-worn sensors or passive sensing can avoid the failure of action understanding in vision related challenges, e.g. occlusion and appearance variation. However, a standard large-scale dataset does not exist, in which different types of modalities across vision and sensors are integrated. To address the disadvantage of vision-based modalities and push towards multi/cross modal action understanding, this paper introduces a new large-scale dataset recorded from 20 distinct subjects with seven different types of modalities: RGB videos, keypoints, acceleration, gyroscope, orientation, Wi-Fi and pressure signal. The dataset consists of more than 36k video clips for 37 action classes covering a wide range of daily life activities such as desktop-related and check-in-based ones in four different distinct scenarios. On the basis of our dataset, we propose a novel multi modality distillation model with attention mechanism to realize an adaptive knowledge transfer from sensor-based modalities to vision-based modalities. The proposed model significantly improves performance of action recognition compared to models trained with only RGB information. The experimental results confirm the effectiveness of our model on cross-subject, -view, -scene and -session evaluation criteria. We believe that this new large-scale multimodal dataset will contribute the community of multimodal based action understanding.
Author	Kong, Quan Wu, Ziming Deng, Ziwei Klinkigt, Martin Tong, Bin Murakami, Tomokazu
Author_xml	– sequence: 1 givenname: Quan surname: Kong fullname: Kong, Quan organization: Hitachi – sequence: 2 givenname: Ziming surname: Wu fullname: Wu, Ziming organization: Hong Kong University of Science and Technology – sequence: 3 givenname: Ziwei surname: Deng fullname: Deng, Ziwei organization: Hitachi – sequence: 4 givenname: Martin surname: Klinkigt fullname: Klinkigt, Martin organization: Hitachi. Ltd – sequence: 5 givenname: Bin surname: Tong fullname: Tong, Bin organization: HITACHI – sequence: 6 givenname: Tomokazu surname: Murakami fullname: Murakami, Tomokazu organization: Hitachi
BookMark	eNotjLtOwzAUQA0CibZ0ZmDxDyRcP-IHWxSgrZSIAcpa3ThOFZQ6yA4Df08lmI50pHOW5CpMwRNyxyBnDOzDrqo-cg7M5gBGFxdkbbVhmhsmDQhzSRZcGMh0AfKGLFP6BBCWG7Ugm6Yp3fxIS1pjPPrszeHo6RPOmPxM-ynSKk4p0WbqcKTb7xMGeg6GKdB96HxMM4ZuCMdbct3jmPz6nyuyf3l-r7ZZ_brZVWWdDVzZOesdYgdGeuV8r4xwvROopOVeFk6aswZwRdG2ndVcgWGKI-oWrNSKSSXFitz_fQfv_eErDieMPwcLYAttxS-s_0vf
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/ICCV.2019.00875
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences
EISBN	9781728148038 1728148030
EISSN	2380-7504
EndPage	8666
ExternalDocumentID	9009579
Genre	orig-research
GroupedDBID	29O 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IPLJI M43 OCL RIE RIL RIO RNS
ID	FETCH-LOGICAL-i269t-fcaad084e6cef683cfc3a6492e45c48e6c00c55bbd972608162aa7b0947614643
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:38:45 EDT 2025
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i269t-fcaad084e6cef683cfc3a6492e45c48e6c00c55bbd972608162aa7b0947614643
PageCount	10
ParticipantIDs	ieee_primary_9009579
PublicationCentury	2000
PublicationDate	2019-10-01
PublicationDateYYYYMMDD	2019-10-01
PublicationDate_xml	– month: 10 year: 2019 text: 2019-10-01 day: 01
PublicationDecade	2010
PublicationTitle	Proceedings / IEEE International Conference on Computer Vision
PublicationTitleAbbrev	ICCV
PublicationYear	2019
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0039286
Score	2.4310584
Snippet	Unlike vision modalities, body-worn sensors or passive sensing can avoid the failure of action understanding in vision related challenges, e.g. occlusion and...
SourceID	ieee
SourceType	Publisher
StartPage	8657
SubjectTerms	Adaptation models Cameras Gyroscopes Sensors Task analysis Three-dimensional displays Videos
Title	MMAct: A Large-Scale Dataset for Cross Modal Human Action Understanding
URI	https://ieeexplore.ieee.org/document/9009579
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NTwMhECVtT56qtsbvcPAo7XYXWNZbs1qrcY2J1vTWADskRtMapRd_vcBua2M8eCNcIDOBNwPvzSB0BtQYkVBNkhgiQqWRxHeAJ4wlwAWnWkNg-d7z8YTeTtm0gc7XWhgACOQz6Plh-MsvF3rpn8r6mQ8I0qyJmi5xq7Raq1vXwbzgdemeQZT1b_L82RO3fDXKQCLc6J0SoGPURsVq0Yox8tpbWtXTX7_qMf53V9uo-yPSww9r-NlBDZjvonYdVeL6zH520HVRDLW9wEN851nf5NF5BfCltA6_LHYxK849UuJiUco3HB718TCoHfBkU_nSRZPR1VM-JnX7BPIS88wSo6UsI0GBazBcJNroRHKaxUCZpsJNR5FmTKkyS11WIwY8ljJVLt9LHWa7SGUPteaLOewjXDp7xwPOjQZBdVqqRKWcOWRjKhFg1AHqeLvM3qsKGbPaJId_Tx-hLe-ZihJ3jFr2YwknDtqtOg0-_QYhJqNk
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1NT8JAEJ0gHvSEHxi_3YMei6Xd3W5NPBAQQYGYCMYb7m6nidGAkRKjv8W_4n9ztxQkxiuJt2YPTbYz2fem-94MwDHSOBY-1Y7voetQGUvHToB3GPORC061xlTl2-GNHr26Z_c5-Jx5YRAxFZ9hyT6md_nRUI_tr7LT0BKCIMwklNf4_mYKtNF5s2aieeJ59YtuteFkMwScR4-HiRNrKSNXUOQaYy58HWtfchp6SJmmwiy7rmZMqSgMDLUXZe5JGShT9Jj6nhq4Nu9dgmXDM5g3cYdNz3lDLATPmgWV3fC0Wa3eWamY7X-ZyhbnprWkYFUvwNd0mxONylNpnKiS_vjVAfK_foc1KP7YEMnNDGDXIYeDDShkvJlkp9JoEy7b7YpOzkiFtKyu3bk1eYekJhOD0AkxrJxULRcg7WEkn0l6bUEqqZ-D9Oa9PUXoLWRTW5AfDAe4DSQKTKlR5jzWKKgOIuWrgDOD3Uz5AmO1A5s2Dv2XSQ-QfhaC3b-Xj2Cl0W23-q1m53oPVm1WTASA-5BPXsd4YIhMog7TfCLwsOjAfQPo1ADF
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+IEEE+International+Conference+on+Computer+Vision&rft.atitle=MMAct%3A+A+Large-Scale+Dataset+for+Cross+Modal+Human+Action+Understanding&rft.au=Kong%2C+Quan&rft.au=Wu%2C+Ziming&rft.au=Deng%2C+Ziwei&rft.au=Klinkigt%2C+Martin&rft.date=2019-10-01&rft.pub=IEEE&rft.eissn=2380-7504&rft.spage=8657&rft.epage=8666&rft_id=info:doi/10.1109%2FICCV.2019.00875&rft.externalDocID=9009579