TVR-Ranking: A Dataset for Ranked Video Moment Retrieval with Imprecise Queries

In this paper, we propose the task of \textit{Ranked Video Moment Retrieval} (RVMR) to locate a ranked list of matching moments from a collection of videos, through queries in natural language. Although a few related tasks have been proposed and studied by CV, NLP, and IR communities, RVMR is the ta...

Full description

Saved in:
Bibliographic Details
Published inarXiv.org
Main Authors Liang, Renjie, Li, Li, Zhang, Chongzhi, Wang, Jing, Zhu, Xizhou, Sun, Aixin
Format Paper
LanguageEnglish
Published Ithaca Cornell University Library, arXiv.org 24.07.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In this paper, we propose the task of \textit{Ranked Video Moment Retrieval} (RVMR) to locate a ranked list of matching moments from a collection of videos, through queries in natural language. Although a few related tasks have been proposed and studied by CV, NLP, and IR communities, RVMR is the task that best reflects the practical setting of moment search. To facilitate research in RVMR, we develop the TVR-Ranking dataset, based on the raw videos and existing moment annotations provided in the TVR dataset. Our key contribution is the manual annotation of relevance levels for 94,442 query-moment pairs. We then develop the \(NDCG@K, IoU\geq \mu\) evaluation metric for this new task and conduct experiments to evaluate three baseline models. Our experiments show that the new RVMR task brings new challenges to existing models and we believe this new dataset contributes to the research on multi-modality search. The dataset is available at \url{https://github.com/Ranking-VMR/TVR-Ranking}
ISSN:2331-8422