The Song Describer Dataset: a Corpus of Audio Captions for Music-and-Language Evaluation

We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models. The dataset consists of 1.1k human-written natural language descriptions of 706 music recordings, all publicly accessible and releas...

Full description

Saved in:

Bibliographic Details
Main Authors	Manco, Ilaria, Weck, Benno, Doh, SeungHeon, Won, Minz, Zhang, Yixiao, Bogdanov, Dmitry, Wu, Yusong, Chen, Ke, Tovstogan, Philip, Benetos, Emmanouil, Quinton, Elio, Fazekas, György, Nam, Juhan
Format	Journal Article
Language	English
Published	16.11.2023
Subjects	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Sound
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We introduce the Song Describer dataset (SDD), a new crowdsourced corpus of high-quality audio-caption pairs, designed for the evaluation of music-and-language models. The dataset consists of 1.1k human-written natural language descriptions of 706 music recordings, all publicly accessible and released under Creative Common licenses. To showcase the use of our dataset, we benchmark popular models on three key music-and-language tasks (music captioning, text-to-music generation and music-language retrieval). Our experiments highlight the importance of cross-dataset evaluation and offer insights into how researchers can use SDD to gain a broader understanding of model performance.
DOI:	10.48550/arxiv.2311.10057