MMAct: A Large-Scale Dataset for Cross Modal Human Action Understanding
Unlike vision modalities, body-worn sensors or passive sensing can avoid the failure of action understanding in vision related challenges, e.g. occlusion and appearance variation. However, a standard large-scale dataset does not exist, in which different types of modalities across vision and sensors...
Saved in:
Published in | Proceedings / IEEE International Conference on Computer Vision pp. 8657 - 8666 |
---|---|
Main Authors | , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.10.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Unlike vision modalities, body-worn sensors or passive sensing can avoid the failure of action understanding in vision related challenges, e.g. occlusion and appearance variation. However, a standard large-scale dataset does not exist, in which different types of modalities across vision and sensors are integrated. To address the disadvantage of vision-based modalities and push towards multi/cross modal action understanding, this paper introduces a new large-scale dataset recorded from 20 distinct subjects with seven different types of modalities: RGB videos, keypoints, acceleration, gyroscope, orientation, Wi-Fi and pressure signal. The dataset consists of more than 36k video clips for 37 action classes covering a wide range of daily life activities such as desktop-related and check-in-based ones in four different distinct scenarios. On the basis of our dataset, we propose a novel multi modality distillation model with attention mechanism to realize an adaptive knowledge transfer from sensor-based modalities to vision-based modalities. The proposed model significantly improves performance of action recognition compared to models trained with only RGB information. The experimental results confirm the effectiveness of our model on cross-subject, -view, -scene and -session evaluation criteria. We believe that this new large-scale multimodal dataset will contribute the community of multimodal based action understanding. |
---|---|
AbstractList | Unlike vision modalities, body-worn sensors or passive sensing can avoid the failure of action understanding in vision related challenges, e.g. occlusion and appearance variation. However, a standard large-scale dataset does not exist, in which different types of modalities across vision and sensors are integrated. To address the disadvantage of vision-based modalities and push towards multi/cross modal action understanding, this paper introduces a new large-scale dataset recorded from 20 distinct subjects with seven different types of modalities: RGB videos, keypoints, acceleration, gyroscope, orientation, Wi-Fi and pressure signal. The dataset consists of more than 36k video clips for 37 action classes covering a wide range of daily life activities such as desktop-related and check-in-based ones in four different distinct scenarios. On the basis of our dataset, we propose a novel multi modality distillation model with attention mechanism to realize an adaptive knowledge transfer from sensor-based modalities to vision-based modalities. The proposed model significantly improves performance of action recognition compared to models trained with only RGB information. The experimental results confirm the effectiveness of our model on cross-subject, -view, -scene and -session evaluation criteria. We believe that this new large-scale multimodal dataset will contribute the community of multimodal based action understanding. |
Author | Kong, Quan Wu, Ziming Deng, Ziwei Klinkigt, Martin Tong, Bin Murakami, Tomokazu |
Author_xml | – sequence: 1 givenname: Quan surname: Kong fullname: Kong, Quan organization: Hitachi – sequence: 2 givenname: Ziming surname: Wu fullname: Wu, Ziming organization: Hong Kong University of Science and Technology – sequence: 3 givenname: Ziwei surname: Deng fullname: Deng, Ziwei organization: Hitachi – sequence: 4 givenname: Martin surname: Klinkigt fullname: Klinkigt, Martin organization: Hitachi. Ltd – sequence: 5 givenname: Bin surname: Tong fullname: Tong, Bin organization: HITACHI – sequence: 6 givenname: Tomokazu surname: Murakami fullname: Murakami, Tomokazu organization: Hitachi |
BookMark | eNotjLtOwzAUQA0CibZ0ZmDxDyRcP-IHWxSgrZSIAcpa3ThOFZQ6yA4Df08lmI50pHOW5CpMwRNyxyBnDOzDrqo-cg7M5gBGFxdkbbVhmhsmDQhzSRZcGMh0AfKGLFP6BBCWG7Ugm6Yp3fxIS1pjPPrszeHo6RPOmPxM-ynSKk4p0WbqcKTb7xMGeg6GKdB96HxMM4ZuCMdbct3jmPz6nyuyf3l-r7ZZ_brZVWWdDVzZOesdYgdGeuV8r4xwvROopOVeFk6aswZwRdG2ndVcgWGKI-oWrNSKSSXFitz_fQfv_eErDieMPwcLYAttxS-s_0vf |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/ICCV.2019.00875 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library (IEL) url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Applied Sciences |
EISBN | 9781728148038 1728148030 |
EISSN | 2380-7504 |
EndPage | 8666 |
ExternalDocumentID | 9009579 |
Genre | orig-research |
GroupedDBID | 29O 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR AAWTH ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IPLJI M43 OCL RIE RIL RIO RNS |
ID | FETCH-LOGICAL-i269t-fcaad084e6cef683cfc3a6492e45c48e6c00c55bbd972608162aa7b0947614643 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 02:38:45 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i269t-fcaad084e6cef683cfc3a6492e45c48e6c00c55bbd972608162aa7b0947614643 |
PageCount | 10 |
ParticipantIDs | ieee_primary_9009579 |
PublicationCentury | 2000 |
PublicationDate | 2019-10-01 |
PublicationDateYYYYMMDD | 2019-10-01 |
PublicationDate_xml | – month: 10 year: 2019 text: 2019-10-01 day: 01 |
PublicationDecade | 2010 |
PublicationTitle | Proceedings / IEEE International Conference on Computer Vision |
PublicationTitleAbbrev | ICCV |
PublicationYear | 2019 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0039286 |
Score | 2.4310584 |
Snippet | Unlike vision modalities, body-worn sensors or passive sensing can avoid the failure of action understanding in vision related challenges, e.g. occlusion and... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 8657 |
SubjectTerms | Adaptation models Cameras Gyroscopes Sensors Task analysis Three-dimensional displays Videos |
Title | MMAct: A Large-Scale Dataset for Cross Modal Human Action Understanding |
URI | https://ieeexplore.ieee.org/document/9009579 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NTwMhECVtT56qtsbvcPAo7XYXWNZbs1qrcY2J1vTWADskRtMapRd_vcBua2M8eCNcIDOBNwPvzSB0BtQYkVBNkhgiQqWRxHeAJ4wlwAWnWkNg-d7z8YTeTtm0gc7XWhgACOQz6Plh-MsvF3rpn8r6mQ8I0qyJmi5xq7Raq1vXwbzgdemeQZT1b_L82RO3fDXKQCLc6J0SoGPURsVq0Yox8tpbWtXTX7_qMf53V9uo-yPSww9r-NlBDZjvonYdVeL6zH520HVRDLW9wEN851nf5NF5BfCltA6_LHYxK849UuJiUco3HB718TCoHfBkU_nSRZPR1VM-JnX7BPIS88wSo6UsI0GBazBcJNroRHKaxUCZpsJNR5FmTKkyS11WIwY8ljJVLt9LHWa7SGUPteaLOewjXDp7xwPOjQZBdVqqRKWcOWRjKhFg1AHqeLvM3qsKGbPaJId_Tx-hLe-ZihJ3jFr2YwknDtqtOg0-_QYhJqNk |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1NT8JAEJ0gHvSEHxi_3YMei6Xd3W5NPBAQQYGYCMYb7m6nidGAkRKjv8W_4n9ztxQkxiuJt2YPTbYz2fem-94MwDHSOBY-1Y7voetQGUvHToB3GPORC061xlTl2-GNHr26Z_c5-Jx5YRAxFZ9hyT6md_nRUI_tr7LT0BKCIMwklNf4_mYKtNF5s2aieeJ59YtuteFkMwScR4-HiRNrKSNXUOQaYy58HWtfchp6SJmmwiy7rmZMqSgMDLUXZe5JGShT9Jj6nhq4Nu9dgmXDM5g3cYdNz3lDLATPmgWV3fC0Wa3eWamY7X-ZyhbnprWkYFUvwNd0mxONylNpnKiS_vjVAfK_foc1KP7YEMnNDGDXIYeDDShkvJlkp9JoEy7b7YpOzkiFtKyu3bk1eYekJhOD0AkxrJxULRcg7WEkn0l6bUEqqZ-D9Oa9PUXoLWRTW5AfDAe4DSQKTKlR5jzWKKgOIuWrgDOD3Uz5AmO1A5s2Dv2XSQ-QfhaC3b-Xj2Cl0W23-q1m53oPVm1WTASA-5BPXsd4YIhMog7TfCLwsOjAfQPo1ADF |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+IEEE+International+Conference+on+Computer+Vision&rft.atitle=MMAct%3A+A+Large-Scale+Dataset+for+Cross+Modal+Human+Action+Understanding&rft.au=Kong%2C+Quan&rft.au=Wu%2C+Ziming&rft.au=Deng%2C+Ziwei&rft.au=Klinkigt%2C+Martin&rft.date=2019-10-01&rft.pub=IEEE&rft.eissn=2380-7504&rft.spage=8657&rft.epage=8666&rft_id=info:doi/10.1109%2FICCV.2019.00875&rft.externalDocID=9009579 |