Egocentric Biochemical Video-and-Language Dataset
This paper proposes a novel biochemical video-and-language (BioVL) dataset, which consists of experimental videos, corresponding protocols, and annotations of alignment between events in the video and instructions in the protocol. The key strength of the dataset is its user-oriented design of data c...
Saved in:
Published in | 2021 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) pp. 3122 - 3126 |
---|---|
Main Authors | , , , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.01.2021
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | This paper proposes a novel biochemical video-and-language (BioVL) dataset, which consists of experimental videos, corresponding protocols, and annotations of alignment between events in the video and instructions in the protocol. The key strength of the dataset is its user-oriented design of data collection. We imagine that biochemical researchers easily take videos and share them for another researcher's replication in the future. To minimize the burden of video recording, we adopted an unedited first-person video as a visual source. As a result, we collected 16 videos from four protocols with a total length of 1.6 hours. In our experiments, we conduct two zero-shot video-and-language tasks on the BioVL dataset. Our experimental results show a large room for improvement for practical use even utilizing the state-of-the-art pre-trained video-and-language joint embedding model. We are going to release the BioVL dataset. To our knowledge, this work is the first attempt to release the biochemical video-and-language dataset. |
---|---|
ISSN: | 2473-9944 |
DOI: | 10.1109/ICCVW54120.2021.00348 |