Soundtrack classification by transient events
We present a method for video classification based on information in the soundtrack. Unlike previous approaches which describe the audio via statistics of mel-frequency cepstral coefficient (MFCC) features calculated on uniformly-spaced frames, we investigate an approach to focusing our representati...
Saved in:
Published in | 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 473 - 476 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.05.2011
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We present a method for video classification based on information in the soundtrack. Unlike previous approaches which describe the audio via statistics of mel-frequency cepstral coefficient (MFCC) features calculated on uniformly-spaced frames, we investigate an approach to focusing our representation on audio transients corresponding to sound-track events. These event-related features can reflect the "foreground" of the soundtrack and capture its short-term temporal structure better than conventional frame-based statistics. We evaluate our method on a test set of 1873 YouTube videos labeled with 25 semantic concepts. Retrieval results based on transient features alone are comparable to an MFCC-based system, and fusing the two representations achieves a relative improvement of 7.5% in mean average precision (MAP). |
---|---|
ISBN: | 9781457705380 1457705389 |
ISSN: | 1520-6149 2379-190X |
DOI: | 10.1109/ICASSP.2011.5946443 |