Multiobjective Time Series Matching for Audio Classification and Retrieval

Seeking sound samples in a massive database can be a tedious and time consuming task. Even when metadata are available, query results may remain far from the timbre expected by users. This problem stems from the nature of query specification, which does not account for the underlying complexity of a...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on audio, speech, and language processing Vol. 21; no. 10; pp. 2057 - 2072
Main Authors Esling, Philippe, Agon, Carlos
Format Journal Article
LanguageEnglish
Published Piscataway, NJ IEEE 01.10.2013
Institute of Electrical and Electronics Engineers
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Seeking sound samples in a massive database can be a tedious and time consuming task. Even when metadata are available, query results may remain far from the timbre expected by users. This problem stems from the nature of query specification, which does not account for the underlying complexity of audio data. The Query By Example (QBE) paradigm tries to tackle this shortcoming by finding audio clips similar to a given sound example. However, it requires users to have a well-formed soundfile of what they seek, which is not always a valid assumption. Furthermore, most audio-retrieval systems rely on a single measure of similarity, which is unlikely to convey the perceptual similarity of audio signals. We address in this paper an innovative way of querying generic audio databases by simultaneously optimizing the temporal evolution of multiple spectral properties. We show how this problem can be cast into a new approach merging multiobjective optimization and time series matching, called MultiObjective Time Series (MOTS) matching. We formally state this problem and report an efficient implementation. This approach introduces a multidimensional assessment of similarity in audio matching. This allows to cope with the multidimensional nature of timbre perception and also to obtain a set of efficient propositions rather than a single best solution. To demonstrate the performances of our approach, we show its efficiency in audio classification tasks. By introducing a selection criterion based on the hypervolume dominated by a class, we show that our approach outstands the state-of-art methods in audio classification even with a few number of features. We demonstrate its robustness to several classes of audio distortions. Finally, we introduce two innovative applications of our method for sound querying.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 23
ISSN:1558-7916
1558-7924
DOI:10.1109/TASL.2013.2265086