MMAct: A Large-Scale Dataset for Cross Modal Human Action Understanding

Unlike vision modalities, body-worn sensors or passive sensing can avoid the failure of action understanding in vision related challenges, e.g. occlusion and appearance variation. However, a standard large-scale dataset does not exist, in which different types of modalities across vision and sensors...

Full description

Saved in:
Bibliographic Details
Published inProceedings / IEEE International Conference on Computer Vision pp. 8657 - 8666
Main Authors Kong, Quan, Wu, Ziming, Deng, Ziwei, Klinkigt, Martin, Tong, Bin, Murakami, Tomokazu
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.10.2019
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Unlike vision modalities, body-worn sensors or passive sensing can avoid the failure of action understanding in vision related challenges, e.g. occlusion and appearance variation. However, a standard large-scale dataset does not exist, in which different types of modalities across vision and sensors are integrated. To address the disadvantage of vision-based modalities and push towards multi/cross modal action understanding, this paper introduces a new large-scale dataset recorded from 20 distinct subjects with seven different types of modalities: RGB videos, keypoints, acceleration, gyroscope, orientation, Wi-Fi and pressure signal. The dataset consists of more than 36k video clips for 37 action classes covering a wide range of daily life activities such as desktop-related and check-in-based ones in four different distinct scenarios. On the basis of our dataset, we propose a novel multi modality distillation model with attention mechanism to realize an adaptive knowledge transfer from sensor-based modalities to vision-based modalities. The proposed model significantly improves performance of action recognition compared to models trained with only RGB information. The experimental results confirm the effectiveness of our model on cross-subject, -view, -scene and -session evaluation criteria. We believe that this new large-scale multimodal dataset will contribute the community of multimodal based action understanding.
AbstractList Unlike vision modalities, body-worn sensors or passive sensing can avoid the failure of action understanding in vision related challenges, e.g. occlusion and appearance variation. However, a standard large-scale dataset does not exist, in which different types of modalities across vision and sensors are integrated. To address the disadvantage of vision-based modalities and push towards multi/cross modal action understanding, this paper introduces a new large-scale dataset recorded from 20 distinct subjects with seven different types of modalities: RGB videos, keypoints, acceleration, gyroscope, orientation, Wi-Fi and pressure signal. The dataset consists of more than 36k video clips for 37 action classes covering a wide range of daily life activities such as desktop-related and check-in-based ones in four different distinct scenarios. On the basis of our dataset, we propose a novel multi modality distillation model with attention mechanism to realize an adaptive knowledge transfer from sensor-based modalities to vision-based modalities. The proposed model significantly improves performance of action recognition compared to models trained with only RGB information. The experimental results confirm the effectiveness of our model on cross-subject, -view, -scene and -session evaluation criteria. We believe that this new large-scale multimodal dataset will contribute the community of multimodal based action understanding.
Author Kong, Quan
Wu, Ziming
Deng, Ziwei
Klinkigt, Martin
Tong, Bin
Murakami, Tomokazu
Author_xml – sequence: 1
  givenname: Quan
  surname: Kong
  fullname: Kong, Quan
  organization: Hitachi
– sequence: 2
  givenname: Ziming
  surname: Wu
  fullname: Wu, Ziming
  organization: Hong Kong University of Science and Technology
– sequence: 3
  givenname: Ziwei
  surname: Deng
  fullname: Deng, Ziwei
  organization: Hitachi
– sequence: 4
  givenname: Martin
  surname: Klinkigt
  fullname: Klinkigt, Martin
  organization: Hitachi. Ltd
– sequence: 5
  givenname: Bin
  surname: Tong
  fullname: Tong, Bin
  organization: HITACHI
– sequence: 6
  givenname: Tomokazu
  surname: Murakami
  fullname: Murakami, Tomokazu
  organization: Hitachi
BookMark eNotjLtOwzAUQA0CibZ0ZmDxDyRcP-IHWxSgrZSIAcpa3ThOFZQ6yA4Df08lmI50pHOW5CpMwRNyxyBnDOzDrqo-cg7M5gBGFxdkbbVhmhsmDQhzSRZcGMh0AfKGLFP6BBCWG7Ugm6Yp3fxIS1pjPPrszeHo6RPOmPxM-ynSKk4p0WbqcKTb7xMGeg6GKdB96HxMM4ZuCMdbct3jmPz6nyuyf3l-r7ZZ_brZVWWdDVzZOesdYgdGeuV8r4xwvROopOVeFk6aswZwRdG2ndVcgWGKI-oWrNSKSSXFitz_fQfv_eErDieMPwcLYAttxS-s_0vf
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICCV.2019.00875
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 9781728148038
1728148030
EISSN 2380-7504
EndPage 8666
ExternalDocumentID 9009579
Genre orig-research
GroupedDBID 29O
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
AAWTH
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IPLJI
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i269t-fcaad084e6cef683cfc3a6492e45c48e6c00c55bbd972608162aa7b0947614643
IEDL.DBID RIE
IngestDate Wed Aug 27 02:38:45 EDT 2025
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i269t-fcaad084e6cef683cfc3a6492e45c48e6c00c55bbd972608162aa7b0947614643
PageCount 10
ParticipantIDs ieee_primary_9009579
PublicationCentury 2000
PublicationDate 2019-10-01
PublicationDateYYYYMMDD 2019-10-01
PublicationDate_xml – month: 10
  year: 2019
  text: 2019-10-01
  day: 01
PublicationDecade 2010
PublicationTitle Proceedings / IEEE International Conference on Computer Vision
PublicationTitleAbbrev ICCV
PublicationYear 2019
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0039286
Score 2.4310584
Snippet Unlike vision modalities, body-worn sensors or passive sensing can avoid the failure of action understanding in vision related challenges, e.g. occlusion and...
SourceID ieee
SourceType Publisher
StartPage 8657
SubjectTerms Adaptation models
Cameras
Gyroscopes
Sensors
Task analysis
Three-dimensional displays
Videos
Title MMAct: A Large-Scale Dataset for Cross Modal Human Action Understanding
URI https://ieeexplore.ieee.org/document/9009579
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1NTwMhECVtT56qtsbvcPAo7XYXWNZbs1qrcY2J1vTWADskRtMapRd_vcBua2M8eCNcIDOBNwPvzSB0BtQYkVBNkhgiQqWRxHeAJ4wlwAWnWkNg-d7z8YTeTtm0gc7XWhgACOQz6Plh-MsvF3rpn8r6mQ8I0qyJmi5xq7Raq1vXwbzgdemeQZT1b_L82RO3fDXKQCLc6J0SoGPURsVq0Yox8tpbWtXTX7_qMf53V9uo-yPSww9r-NlBDZjvonYdVeL6zH520HVRDLW9wEN851nf5NF5BfCltA6_LHYxK849UuJiUco3HB718TCoHfBkU_nSRZPR1VM-JnX7BPIS88wSo6UsI0GBazBcJNroRHKaxUCZpsJNR5FmTKkyS11WIwY8ljJVLt9LHWa7SGUPteaLOewjXDp7xwPOjQZBdVqqRKWcOWRjKhFg1AHqeLvM3qsKGbPaJId_Tx-hLe-ZihJ3jFr2YwknDtqtOg0-_QYhJqNk
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwzV1NT8JAEJ0gHvSEHxi_3YMei6Xd3W5NPBAQQYGYCMYb7m6nidGAkRKjv8W_4n9ztxQkxiuJt2YPTbYz2fem-94MwDHSOBY-1Y7voetQGUvHToB3GPORC061xlTl2-GNHr26Z_c5-Jx5YRAxFZ9hyT6md_nRUI_tr7LT0BKCIMwklNf4_mYKtNF5s2aieeJ59YtuteFkMwScR4-HiRNrKSNXUOQaYy58HWtfchp6SJmmwiy7rmZMqSgMDLUXZe5JGShT9Jj6nhq4Nu9dgmXDM5g3cYdNz3lDLATPmgWV3fC0Wa3eWamY7X-ZyhbnprWkYFUvwNd0mxONylNpnKiS_vjVAfK_foc1KP7YEMnNDGDXIYeDDShkvJlkp9JoEy7b7YpOzkiFtKyu3bk1eYekJhOD0AkxrJxULRcg7WEkn0l6bUEqqZ-D9Oa9PUXoLWRTW5AfDAe4DSQKTKlR5jzWKKgOIuWrgDOD3Uz5AmO1A5s2Dv2XSQ-QfhaC3b-Xj2Cl0W23-q1m53oPVm1WTASA-5BPXsd4YIhMog7TfCLwsOjAfQPo1ADF
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=Proceedings+%2F+IEEE+International+Conference+on+Computer+Vision&rft.atitle=MMAct%3A+A+Large-Scale+Dataset+for+Cross+Modal+Human+Action+Understanding&rft.au=Kong%2C+Quan&rft.au=Wu%2C+Ziming&rft.au=Deng%2C+Ziwei&rft.au=Klinkigt%2C+Martin&rft.date=2019-10-01&rft.pub=IEEE&rft.eissn=2380-7504&rft.spage=8657&rft.epage=8666&rft_id=info:doi/10.1109%2FICCV.2019.00875&rft.externalDocID=9009579