Audio-visual intent-to-speak detection for human-computer interaction

Introduces a practical system that aims to detect a user's intent to speak to a computer, by considering both audio and visual cues. The whole system is designed to intuitively turn on the microphone for speech recognition without needing to click on a mouse, thus improving the human-like commu...

Full description

Saved in:

Bibliographic Details
Published in	2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100) Vol. 4; pp. 2373 - 2376 vol.4
Main Authors	De Cuetos, P., Neti, C., Senior, A.W.
Format	Conference Proceeding
Language	English
Published	IEEE 2000
Subjects	Face detection Humans Keyboards Mice Mouth Shape Speech recognition Text processing USA Councils
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Introduces a practical system that aims to detect a user's intent to speak to a computer, by considering both audio and visual cues. The whole system is designed to intuitively turn on the microphone for speech recognition without needing to click on a mouse, thus improving the human-like communication between users and computers. The first step is to detect a frontal face through a simple desktop video camera image, by using some well-known image processing techniques for face and facial feature detection on one image. The second step is an audio-visual speech event detection that combines both visual and audio indications of speech. In this paper, we consider visual measures of speech activity as well as audio energy to determine if the previously detected user is actually speaking or not.
ISBN:	9780780362932 0780362934
ISSN:	1520-6149 2379-190X
DOI:	10.1109/ICASSP.2000.859318