EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos with Procedural Texts
Mistake action detection from egocentric videos is crucial for developing intelligent archives that detect workers' errors and provide feedback. Previous studies have been limited to specific domains, focused on detecting mistakes from videos without procedural texts, and analyzed whether actio...
Saved in:
Main Authors | , , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
07.10.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Mistake action detection from egocentric videos is crucial for developing
intelligent archives that detect workers' errors and provide feedback. Previous
studies have been limited to specific domains, focused on detecting mistakes
from videos without procedural texts, and analyzed whether actions are
mistakes. To address these limitations, in this paper, we propose the EgoOops
dataset, which includes egocentric videos, procedural texts, and three types of
annotations: video-text alignment, mistake labels, and descriptions for
mistakes. EgoOops covers five procedural domains and includes 50 egocentric
videos. The video-text alignment allows the model to detect mistakes based on
both videos and procedural texts. The mistake labels and descriptions enable
detailed analysis of real-world mistakes. Based on EgoOops, we tackle two
tasks: video-text alignment and mistake detection. For video-text alignment, we
enhance the recent StepFormer model with an additional loss for fine-tuning.
Based on the alignment results, we propose a multi-modal classifier to predict
mistake labels. In our experiments, the proposed methods achieve higher
performance than the baselines. In addition, our ablation study demonstrates
the effectiveness of combining videos and texts. We will release the dataset
and codes upon publication. |
---|---|
AbstractList | Mistake action detection from egocentric videos is crucial for developing
intelligent archives that detect workers' errors and provide feedback. Previous
studies have been limited to specific domains, focused on detecting mistakes
from videos without procedural texts, and analyzed whether actions are
mistakes. To address these limitations, in this paper, we propose the EgoOops
dataset, which includes egocentric videos, procedural texts, and three types of
annotations: video-text alignment, mistake labels, and descriptions for
mistakes. EgoOops covers five procedural domains and includes 50 egocentric
videos. The video-text alignment allows the model to detect mistakes based on
both videos and procedural texts. The mistake labels and descriptions enable
detailed analysis of real-world mistakes. Based on EgoOops, we tackle two
tasks: video-text alignment and mistake detection. For video-text alignment, we
enhance the recent StepFormer model with an additional loss for fine-tuning.
Based on the alignment results, we propose a multi-modal classifier to predict
mistake labels. In our experiments, the proposed methods achieve higher
performance than the baselines. In addition, our ablation study demonstrates
the effectiveness of combining videos and texts. We will release the dataset
and codes upon publication. |
Author | Shirai, Keisuke Yoshida, Tomoya Nishimoto, Tomohiro Haneji, Yuto Kameko, Hirotaka Mori, Shinsuke Cui, Taiyu Yamamoto, Koki Nishimura, Taichi Kajimura, Keiya |
Author_xml | – sequence: 1 givenname: Yuto surname: Haneji fullname: Haneji, Yuto – sequence: 2 givenname: Taichi surname: Nishimura fullname: Nishimura, Taichi – sequence: 3 givenname: Hirotaka surname: Kameko fullname: Kameko, Hirotaka – sequence: 4 givenname: Keisuke surname: Shirai fullname: Shirai, Keisuke – sequence: 5 givenname: Tomoya surname: Yoshida fullname: Yoshida, Tomoya – sequence: 6 givenname: Keiya surname: Kajimura fullname: Kajimura, Keiya – sequence: 7 givenname: Koki surname: Yamamoto fullname: Yamamoto, Koki – sequence: 8 givenname: Taiyu surname: Cui fullname: Cui, Taiyu – sequence: 9 givenname: Tomohiro surname: Nishimoto fullname: Nishimoto, Tomohiro – sequence: 10 givenname: Shinsuke surname: Mori fullname: Mori, Shinsuke |
BackLink | https://doi.org/10.48550/arXiv.2410.05343$$DView paper in arXiv |
BookMark | eNqFjr0OgkAQhK_Qwr8HsHJfQESBxNgRwdgYLYyVCbnAohfhjuytim8vor3VTL7MJF9fdLTRKMR47jr-MgjcmaRaPZyF3wA38HyvJ87xxexNZVcQQiRZWmTIDcFOWZY3hDBlZTREyPhtOZkSmlOKmkmlcFIZGgtPxVc4UIOzO8kCjlizHYpuLguLo18OxGQTH9fbaeuRVKRKSa_k45O0Pt7_xRs_m0Il |
ContentType | Journal Article |
Copyright | http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
Copyright_xml | – notice: http://arxiv.org/licenses/nonexclusive-distrib/1.0 |
DBID | AKY GOX |
DOI | 10.48550/arxiv.2410.05343 |
DatabaseName | arXiv Computer Science arXiv.org |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: GOX name: arXiv.org url: http://arxiv.org/find sourceTypes: Open Access Repository |
DeliveryMethod | fulltext_linktorsrc |
ExternalDocumentID | 2410_05343 |
GroupedDBID | AKY GOX |
ID | FETCH-arxiv_primary_2410_053433 |
IEDL.DBID | GOX |
IngestDate | Fri Oct 11 20:38:53 EDT 2024 |
IsDoiOpenAccess | true |
IsOpenAccess | true |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-arxiv_primary_2410_053433 |
OpenAccessLink | https://arxiv.org/abs/2410.05343 |
ParticipantIDs | arxiv_primary_2410_05343 |
PublicationCentury | 2000 |
PublicationDate | 2024-10-07 |
PublicationDateYYYYMMDD | 2024-10-07 |
PublicationDate_xml | – month: 10 year: 2024 text: 2024-10-07 day: 07 |
PublicationDecade | 2020 |
PublicationYear | 2024 |
Score | 3.8762374 |
SecondaryResourceType | preprint |
Snippet | Mistake action detection from egocentric videos is crucial for developing
intelligent archives that detect workers' errors and provide feedback. Previous... |
SourceID | arxiv |
SourceType | Open Access Repository |
SubjectTerms | Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Computer Vision and Pattern Recognition |
Title | EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos with Procedural Texts |
URI | https://arxiv.org/abs/2410.05343 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwdV07T8MwED61nVgQCFB538AaaOy8Oka0pUIqXQrKgBTF8QVFSAQ1KeLnc7aLYOlm-Xny6_ts350BbngzVKGoQk9QZL4wK31ec1XikVS-r8e-osTYDi-eovlz8JiFWQ_w1xamWH_XX84_sGrvGF5GtzxNAtmHvhBGZethmbnHSeuKa5v_Lx9zTBv1DyRmB7C_ZXeYuuE4hB59HMHr9K1ZNtwupjgpOoaNDpkq4sJQt3fC1JoW4IQ6ciFj8oFcyCpO1iW-1JqaFs2VKVrFfm18ZeCK99X2GK5n09X93LPy5J_OeURuRM2tqPIEBnzEpyGgEmqkhS61iKMgqSrlU8BEUouilPG4kKcw3FXL2e6kc9gTDMFW9Sy-gEG33tAlQ2inrmw__gDoonZJ |
link.rule.ids | 228,230,783,888 |
linkProvider | Cornell University |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=EgoOops%3A+A+Dataset+for+Mistake+Action+Detection+from+Egocentric+Videos+with+Procedural+Texts&rft.au=Haneji%2C+Yuto&rft.au=Nishimura%2C+Taichi&rft.au=Kameko%2C+Hirotaka&rft.au=Shirai%2C+Keisuke&rft.date=2024-10-07&rft_id=info:doi/10.48550%2Farxiv.2410.05343&rft.externalDocID=2410_05343 |