EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos with Procedural Texts

Mistake action detection from egocentric videos is crucial for developing intelligent archives that detect workers' errors and provide feedback. Previous studies have been limited to specific domains, focused on detecting mistakes from videos without procedural texts, and analyzed whether actio...

Full description

Saved in:
Bibliographic Details
Main Authors Haneji, Yuto, Nishimura, Taichi, Kameko, Hirotaka, Shirai, Keisuke, Yoshida, Tomoya, Kajimura, Keiya, Yamamoto, Koki, Cui, Taiyu, Nishimoto, Tomohiro, Mori, Shinsuke
Format Journal Article
LanguageEnglish
Published 07.10.2024
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Mistake action detection from egocentric videos is crucial for developing intelligent archives that detect workers' errors and provide feedback. Previous studies have been limited to specific domains, focused on detecting mistakes from videos without procedural texts, and analyzed whether actions are mistakes. To address these limitations, in this paper, we propose the EgoOops dataset, which includes egocentric videos, procedural texts, and three types of annotations: video-text alignment, mistake labels, and descriptions for mistakes. EgoOops covers five procedural domains and includes 50 egocentric videos. The video-text alignment allows the model to detect mistakes based on both videos and procedural texts. The mistake labels and descriptions enable detailed analysis of real-world mistakes. Based on EgoOops, we tackle two tasks: video-text alignment and mistake detection. For video-text alignment, we enhance the recent StepFormer model with an additional loss for fine-tuning. Based on the alignment results, we propose a multi-modal classifier to predict mistake labels. In our experiments, the proposed methods achieve higher performance than the baselines. In addition, our ablation study demonstrates the effectiveness of combining videos and texts. We will release the dataset and codes upon publication.
AbstractList Mistake action detection from egocentric videos is crucial for developing intelligent archives that detect workers' errors and provide feedback. Previous studies have been limited to specific domains, focused on detecting mistakes from videos without procedural texts, and analyzed whether actions are mistakes. To address these limitations, in this paper, we propose the EgoOops dataset, which includes egocentric videos, procedural texts, and three types of annotations: video-text alignment, mistake labels, and descriptions for mistakes. EgoOops covers five procedural domains and includes 50 egocentric videos. The video-text alignment allows the model to detect mistakes based on both videos and procedural texts. The mistake labels and descriptions enable detailed analysis of real-world mistakes. Based on EgoOops, we tackle two tasks: video-text alignment and mistake detection. For video-text alignment, we enhance the recent StepFormer model with an additional loss for fine-tuning. Based on the alignment results, we propose a multi-modal classifier to predict mistake labels. In our experiments, the proposed methods achieve higher performance than the baselines. In addition, our ablation study demonstrates the effectiveness of combining videos and texts. We will release the dataset and codes upon publication.
Author Shirai, Keisuke
Yoshida, Tomoya
Nishimoto, Tomohiro
Haneji, Yuto
Kameko, Hirotaka
Mori, Shinsuke
Cui, Taiyu
Yamamoto, Koki
Nishimura, Taichi
Kajimura, Keiya
Author_xml – sequence: 1
  givenname: Yuto
  surname: Haneji
  fullname: Haneji, Yuto
– sequence: 2
  givenname: Taichi
  surname: Nishimura
  fullname: Nishimura, Taichi
– sequence: 3
  givenname: Hirotaka
  surname: Kameko
  fullname: Kameko, Hirotaka
– sequence: 4
  givenname: Keisuke
  surname: Shirai
  fullname: Shirai, Keisuke
– sequence: 5
  givenname: Tomoya
  surname: Yoshida
  fullname: Yoshida, Tomoya
– sequence: 6
  givenname: Keiya
  surname: Kajimura
  fullname: Kajimura, Keiya
– sequence: 7
  givenname: Koki
  surname: Yamamoto
  fullname: Yamamoto, Koki
– sequence: 8
  givenname: Taiyu
  surname: Cui
  fullname: Cui, Taiyu
– sequence: 9
  givenname: Tomohiro
  surname: Nishimoto
  fullname: Nishimoto, Tomohiro
– sequence: 10
  givenname: Shinsuke
  surname: Mori
  fullname: Mori, Shinsuke
BackLink https://doi.org/10.48550/arXiv.2410.05343$$DView paper in arXiv
BookMark eNqFjr0OgkAQhK_Qwr8HsHJfQESBxNgRwdgYLYyVCbnAohfhjuytim8vor3VTL7MJF9fdLTRKMR47jr-MgjcmaRaPZyF3wA38HyvJ87xxexNZVcQQiRZWmTIDcFOWZY3hDBlZTREyPhtOZkSmlOKmkmlcFIZGgtPxVc4UIOzO8kCjlizHYpuLguLo18OxGQTH9fbaeuRVKRKSa_k45O0Pt7_xRs_m0Il
ContentType Journal Article
Copyright http://arxiv.org/licenses/nonexclusive-distrib/1.0
Copyright_xml – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0
DBID AKY
GOX
DOI 10.48550/arxiv.2410.05343
DatabaseName arXiv Computer Science
arXiv.org
DatabaseTitleList
Database_xml – sequence: 1
  dbid: GOX
  name: arXiv.org
  url: http://arxiv.org/find
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
ExternalDocumentID 2410_05343
GroupedDBID AKY
GOX
ID FETCH-arxiv_primary_2410_053433
IEDL.DBID GOX
IngestDate Fri Oct 11 20:38:53 EDT 2024
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-arxiv_primary_2410_053433
OpenAccessLink https://arxiv.org/abs/2410.05343
ParticipantIDs arxiv_primary_2410_05343
PublicationCentury 2000
PublicationDate 2024-10-07
PublicationDateYYYYMMDD 2024-10-07
PublicationDate_xml – month: 10
  year: 2024
  text: 2024-10-07
  day: 07
PublicationDecade 2020
PublicationYear 2024
Score 3.8762374
SecondaryResourceType preprint
Snippet Mistake action detection from egocentric videos is crucial for developing intelligent archives that detect workers' errors and provide feedback. Previous...
SourceID arxiv
SourceType Open Access Repository
SubjectTerms Computer Science - Artificial Intelligence
Computer Science - Computation and Language
Computer Science - Computer Vision and Pattern Recognition
Title EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos with Procedural Texts
URI https://arxiv.org/abs/2410.05343
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV07T8MwED61nVgQCFB538AaaOy8Oka0pUIqXQrKgBTF8QVFSAQ1KeLnc7aLYOlm-Xny6_ts350BbngzVKGoQk9QZL4wK31ec1XikVS-r8e-osTYDi-eovlz8JiFWQ_w1xamWH_XX84_sGrvGF5GtzxNAtmHvhBGZethmbnHSeuKa5v_Lx9zTBv1DyRmB7C_ZXeYuuE4hB59HMHr9K1ZNtwupjgpOoaNDpkq4sJQt3fC1JoW4IQ6ciFj8oFcyCpO1iW-1JqaFs2VKVrFfm18ZeCK99X2GK5n09X93LPy5J_OeURuRM2tqPIEBnzEpyGgEmqkhS61iKMgqSrlU8BEUouilPG4kKcw3FXL2e6kc9gTDMFW9Sy-gEG33tAlQ2inrmw__gDoonZJ
link.rule.ids 228,230,783,888
linkProvider Cornell University
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=EgoOops%3A+A+Dataset+for+Mistake+Action+Detection+from+Egocentric+Videos+with+Procedural+Texts&rft.au=Haneji%2C+Yuto&rft.au=Nishimura%2C+Taichi&rft.au=Kameko%2C+Hirotaka&rft.au=Shirai%2C+Keisuke&rft.date=2024-10-07&rft_id=info:doi/10.48550%2Farxiv.2410.05343&rft.externalDocID=2410_05343