Audiovisual integration with Segment Models for tennis video parsing

Automatic video content analysis is an emerging research subject with numerous applications to large video databases and personal video recording systems. The aim of this study is to fuse multimodal information in order to automatically parse the underlying structure of tennis broadcasts. The frame-...

Full description

Saved in:

Bibliographic Details
Published in	Computer vision and image understanding Vol. 111; no. 2; pp. 142 - 154
Main Authors	Delakis, Manolis, Gravier, Guillaume, Gros, Patrick
Format	Journal Article
Language	English
Published	Amsterdam Elsevier Inc 01.08.2008 Elsevier
Subjects	Applied sciences Artificial intelligence Computer Science Computer science; control theory; systems Engineering Sciences Exact sciences and technology Hidden Markov Models Multimodal fusion Pattern recognition. Digital image processing. Computational geometry Segment Models Signal and Image Processing Video indexing Video summarization Video summarization Segment Models Hidden Markov Models Video indexing Multimodal fusion Image processing Video signal Very large databases Syntactic analysis Modeling Audiovisual Video databases Segmental Audiovisual equipment Content analysis Tennis Computer vision Abstract Synchronization Video recording Image analysis Hearing Sampling rate Hidden Markov model Data fusion Data models Automatic analysis Indexing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Automatic video content analysis is an emerging research subject with numerous applications to large video databases and personal video recording systems. The aim of this study is to fuse multimodal information in order to automatically parse the underlying structure of tennis broadcasts. The frame-based observation distributions of Hidden Markov Models are too strict in modeling heterogeneous audiovisual data. We propose instead the use of segmental features, of the framework of Segment Models, to overcome this limitation and extend the synchronization points to the segment boundaries. Considering each segment as a video scene, auditory and visual features collected inside the scene boundaries can thus be sampled and modeled with their native sampling rates and models. Experimental results on a corpus of 15-h tennis video demonstrated a performance superiority of Segment Models with synchronous audiovisual fusion over Hidden Markov Models. Results though with asynchronous fusion are less optimistic.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	1077-3142 1090-235X
DOI:	10.1016/j.cviu.2007.09.002