Content-based retrieval of MP3 songs based on query by singing

With the growth of multimedia in the Internet, content analysis of multimedia plays an important role for humanistic management. We investigate the content-based retrieval of MP3 songs based on the interface of query by singing. MDCT (modified DCT) spectral coefficients are directly used to represen...

Full description

Saved in:
Bibliographic Details
Published in2004 IEEE International Conference on Acoustics, Speech, and Signal Processing Vol. 5; pp. V - 929
Main Authors LIE, Wen-Nung, SU, Chen-Kang
Format Conference Proceeding
LanguageEnglish
Published Piscataway, N.J IEEE 2004
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:With the growth of multimedia in the Internet, content analysis of multimedia plays an important role for humanistic management. We investigate the content-based retrieval of MP3 songs based on the interface of query by singing. MDCT (modified DCT) spectral coefficients are directly used to represent the tonic characteristics of a short-term sound. This spectral profile is used for detailed matching between two audio segments. Perceptual features are also computed from MDCT coefficients for audio classification. Two pre-stages based on SVM and k-means classifications are used to remove incorrect (or noisy) segment candidates and to speed up the subsequent matching process. On the other hand, exponential key-scaling schemes and time-warping techniques are developed to overcome key difference and tempo variation between different singers. Experiments show that the retrieval probability of our design can achieve up to 76% among the top 5 out of a total of 114 excerpts in the database.
ISBN:9780780384842
0780384849
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.2004.1327264