Audio Matters in Visual Attention

There is a dearth of information on how perceived auditory information guides image-viewing behavior. To investigate auditory-driven visual attention, we first generated a human eye-fixation database from a pool of 200 static images and 400 image-audio pairs viewed by 48 subjects. The eye tracking d...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 24; no. 11; pp. 1992 - 2003
Main Authors	Chen, Yanxiang, Nguyen, Tam V., Kankanhalli, Mohan, Yuan, Jun, Yan, Shuicheng, Wang, Meng
Format	Journal Article
Language	English
Published	New York IEEE 01.11.2014 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Audio data Coherence Computational modeling Educational institutions Entropy Fixation Mathematical models Motorcycles Object oriented modeling Sensory perception Tracking Visual Visualization visual attention Audio source incoherent target object coherent
Online Access	Get full text

Cover

Loading…

More Information
Summary:	There is a dearth of information on how perceived auditory information guides image-viewing behavior. To investigate auditory-driven visual attention, we first generated a human eye-fixation database from a pool of 200 static images and 400 image-audio pairs viewed by 48 subjects. The eye tracking data for the image-audio pairs were captured while participants viewed images, which took place immediately after exposure to coherent/incoherent audio samples. The database was analyzed in terms of time to first fixation, fixation durations on the target object, entropy, AUC, and saliency ratio. It was found that coherent audio information is an important cue for enhancing the feature-specific response to the target object. Conversely, incoherent audio information attenuates this response. Finally, a system predicting the image-viewing with the influence of different audio sources was developed. The detailedly discussed top-down module in the system is composed of auditory estimation based on Gaussian mixture model-maximum a posteriori algorithm-universal background model structure, as well as visual estimation based on the conditional random field model and sparse latent variables. The evaluation experiments show that the proposed models in the system exhibit strong consistency with eye fixations.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2014.2329380