Towards efficient audio thumbnailing

Audio thumbnailing, which aims at finding the most representative audio segment of a music recording, is an important task in music information retrieval. In this paper, we show how the computational efficiency of a recently proposed state-of-the-art thumbnailing approach can be improved significant...

Full description

Saved in:
Bibliographic Details
Published in2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 5192 - 5196
Main Authors Nanzhu Jiang, Müller, Meinard
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.05.2014
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Audio thumbnailing, which aims at finding the most representative audio segment of a music recording, is an important task in music information retrieval. In this paper, we show how the computational efficiency of a recently proposed state-of-the-art thumbnailing approach can be improved significantly. The basic idea of the previous approach is to compute for each possible segment a fitness value that expresses repetitiveness and then to define the thumbnail as the fitness-maximizing segment. As a first acceleration strategy, we propose an efficient multi-level sampling strategy to reduce the number of segments the fitness has to be computed for. Second, we obtain further accelerations by suitably adjusting the resolution used in the fitness computation depending on the level of the segment. As a third contribution, we exploit an intrinsic property of the fitness computation that allows us to estimate the fitness for certain segments without any further computation. Our experimental results show that combining these three strategies leads to accelerations by a factor of 20 to 200 depending on the duration of the song while keeping the overall accuracy for the thumbnail estimation.
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.2014.6854593