Egocentric Biochemical Video-and-Language Dataset

This paper proposes a novel biochemical video-and-language (BioVL) dataset, which consists of experimental videos, corresponding protocols, and annotations of alignment between events in the video and instructions in the protocol. The key strength of the dataset is its user-oriented design of data c...

Full description

Saved in:

Bibliographic Details
Published in	2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) pp. 3122 - 3126
Main Authors	Nishimura, Taichi, Sakoda, Kojiro, Hashimoto, Atsushi, Ushiku, Yoshitaka, Tanaka, Natsuko, Ono, Fumihito, Kameko, Hirotaka, Mori, Shinsuke
Format	Conference Proceeding
Language	English
Published	IEEE 01.01.2021
Subjects	Annotations Biological system modeling Computer vision Conferences Data collection Protocols Visualization
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper proposes a novel biochemical video-and-language (BioVL) dataset, which consists of experimental videos, corresponding protocols, and annotations of alignment between events in the video and instructions in the protocol. The key strength of the dataset is its user-oriented design of data collection. We imagine that biochemical researchers easily take videos and share them for another researcher's replication in the future. To minimize the burden of video recording, we adopted an unedited first-person video as a visual source. As a result, we collected 16 videos from four protocols with a total length of 1.6 hours. In our experiments, we conduct two zero-shot video-and-language tasks on the BioVL dataset. Our experimental results show a large room for improvement for practical use even utilizing the state-of-the-art pre-trained video-and-language joint embedding model. We are going to release the BioVL dataset. To our knowledge, this work is the first attempt to release the biochemical video-and-language dataset.
ISSN:	2473-9944
DOI:	10.1109/ICCVW54120.2021.00348