The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation
We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models. The dataset consists of 1.1k human-written natural language descriptions of 706 music recordings, all publicly accessible and releas...
Saved in:
Main Authors | , , , , , , , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
16.11.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of
high-quality audio-caption pairs, designed for the evaluation of
music-and-language models. The dataset consists of 1.1k human-written natural
language descriptions of 706 music recordings, all publicly accessible and
released under Creative Common licenses. To showcase the use of our dataset, we
benchmark popular models on three key music-and-language tasks (music
captioning, text-to-music generation and music-language retrieval). Our
experiments highlight the importance of cross-dataset evaluation and offer
insights into how researchers can use SDD to gain a broader understanding of
model performance. |
---|---|
DOI: | 10.48550/arxiv.2311.10057 |