Egocentric Biochemical Video-and-Language Dataset

This paper proposes a novel biochemical video-and-language (BioVL) dataset, which consists of experimental videos, corresponding protocols, and annotations of alignment between events in the video and instructions in the protocol. The key strength of the dataset is its user-oriented design of data c...

Full description

Saved in:
Bibliographic Details
Published in2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) pp. 3122 - 3126
Main Authors Nishimura, Taichi, Sakoda, Kojiro, Hashimoto, Atsushi, Ushiku, Yoshitaka, Tanaka, Natsuko, Ono, Fumihito, Kameko, Hirotaka, Mori, Shinsuke
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.01.2021
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper proposes a novel biochemical video-and-language (BioVL) dataset, which consists of experimental videos, corresponding protocols, and annotations of alignment between events in the video and instructions in the protocol. The key strength of the dataset is its user-oriented design of data collection. We imagine that biochemical researchers easily take videos and share them for another researcher's replication in the future. To minimize the burden of video recording, we adopted an unedited first-person video as a visual source. As a result, we collected 16 videos from four protocols with a total length of 1.6 hours. In our experiments, we conduct two zero-shot video-and-language tasks on the BioVL dataset. Our experimental results show a large room for improvement for practical use even utilizing the state-of-the-art pre-trained video-and-language joint embedding model. We are going to release the BioVL dataset. To our knowledge, this work is the first attempt to release the biochemical video-and-language dataset.
ISSN:2473-9944
DOI:10.1109/ICCVW54120.2021.00348