Audio-visual intent-to-speak detection for human-computer interaction

Introduces a practical system that aims to detect a user's intent to speak to a computer, by considering both audio and visual cues. The whole system is designed to intuitively turn on the microphone for speech recognition without needing to click on a mouse, thus improving the human-like commu...

Full description

Saved in:

Bibliographic Details
Published in	2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100) Vol. 4; pp. 2373 - 2376 vol.4
Main Authors	De Cuetos, P., Neti, C., Senior, A.W.
Format	Conference Proceeding
Language	English
Published	IEEE 2000
Subjects	Face detection Humans Keyboards Mice Mouth Shape Speech recognition Text processing USA Councils
Online Access	Get full text

Cover

Loading…

Abstract	Introduces a practical system that aims to detect a user's intent to speak to a computer, by considering both audio and visual cues. The whole system is designed to intuitively turn on the microphone for speech recognition without needing to click on a mouse, thus improving the human-like communication between users and computers. The first step is to detect a frontal face through a simple desktop video camera image, by using some well-known image processing techniques for face and facial feature detection on one image. The second step is an audio-visual speech event detection that combines both visual and audio indications of speech. In this paper, we consider visual measures of speech activity as well as audio energy to determine if the previously detected user is actually speaking or not.
AbstractList	Introduces a practical system that aims to detect a user's intent to speak to a computer, by considering both audio and visual cues. The whole system is designed to intuitively turn on the microphone for speech recognition without needing to click on a mouse, thus improving the human-like communication between users and computers. The first step is to detect a frontal face through a simple desktop video camera image, by using some well-known image processing techniques for face and facial feature detection on one image. The second step is an audio-visual speech event detection that combines both visual and audio indications of speech. In this paper, we consider visual measures of speech activity as well as audio energy to determine if the previously detected user is actually speaking or not.
Author	De Cuetos, P. Neti, C. Senior, A.W.
Author_xml	– sequence: 1 givenname: P. surname: De Cuetos fullname: De Cuetos, P. organization: Inst. Eurecom, Sophia-Antipolis, France – sequence: 2 givenname: C. surname: Neti fullname: Neti, C. – sequence: 3 givenname: A.W. surname: Senior fullname: Senior, A.W.
BookMark	eNotUFtLwzAYDTrBbu4P7Kl_IDNf0rTJ4xjzAgOFKfg2cvmK0TUtbSr47y2bcOA8nAuHMyez2EYkZAVsDcD0_fN2czi8rjljbK2kFqCuSMZFpSlo9nFNlrpSbIIouRZ8RjKQnNESCn1L5sPwNeVUVaiM7DajDy39CcNoTnmICWOiqaVDh-Y795jQpdDGvG77_HNsTKSubboxYX829-Ys35Gb2pwGXP7zgrw_7N62T3T_8jhN3dMAFU_UaCZqXzopLXCjjPbCccucVLaWxnqJ3ldKW9SAzBpUZQncS3CSYTG5xIKsLr0BEY9dHxrT_x4vB4g_m2RQ7g
ContentType	Conference Proceeding
DBID	6IE 6IH CBEJK RIE RIO
DOI	10.1109/ICASSP.2000.859318
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Xplore Digital Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	2379-190X
EndPage	2376 vol.4
ExternalDocumentID	859318
GroupedDBID	23M 29P 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI JC5 M43 OCL RIE RIL RIO RNS
ID	FETCH-LOGICAL-i172t-a903fd6c55b12a8a9d3c2b0c58bf5abd5edd789be91e0bae86612d51c50e4c583
IEDL.DBID	RIE
ISBN	9780780362932 0780362934
ISSN	1520-6149
IngestDate	Wed Jun 26 19:23:09 EDT 2024
IsPeerReviewed	false
IsScholarly	true
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i172t-a903fd6c55b12a8a9d3c2b0c58bf5abd5edd789be91e0bae86612d51c50e4c583
ParticipantIDs	ieee_primary_859318
PublicationCentury	2000
PublicationDate	20000000
PublicationDateYYYYMMDD	2000-01-01
PublicationDate_xml	– year: 2000 text: 20000000
PublicationDecade	2000
PublicationTitle	2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)
PublicationTitleAbbrev	ICASSP
PublicationYear	2000
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0008748 ssj0000454833
Score	1.5669456
Snippet	Introduces a practical system that aims to detect a user's intent to speak to a computer, by considering both audio and visual cues. The whole system is...
SourceID	ieee
SourceType	Publisher
StartPage	2373
SubjectTerms	Face detection Humans Keyboards Mice Mouth Shape Speech recognition Text processing USA Councils
Title	Audio-visual intent-to-speak detection for human-computer interaction
URI	https://ieeexplore.ieee.org/document/859318
Volume	4
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JasMwEBVNTu2lbZrSHR96lSMvsqVjCAlpoSWQBnILWsYQUuyQ2Dn06yvJTrrQQ2-WEQgLoTcznvceQo8sEoSojGOdpgTHUUwxU8CxgYYMlOSgnIDpy2synsXPczpvdLYdFwYAXPMZ-PbR_cvXhapsqaxntbkC1kKtlPOaqnUop1glOWeb21zCLHXGWQadbHYUc5exM3td8yhuhHf243BPpiG89zToT6cTR2Dx6-V-2K441Bmd1nTurRMrtM0mK78qpa8-fkk5_vODzlD3i97nTQ7AdY6OIO-gk2_KhBdo2K_0ssC75bYS757VlMhLXBZ4uwax8jSUroMr90zI6zmbP6waewg3eVPTJbpoNhq-Dca4cVzASxPIlFhwEmU6UZTKIBRMcB2pUBJFmcyokJqC1injEngARApgBt1DTQNFCcRmVnSJ2nmRwxXyOBHAmWQxTaXJIbXUEGWSZVmaSBXq5Bp17H4s1rWoxqLeips_396i45oBbysfd6hdbiq4N7FAKR_cKfgEp4euyA
link.rule.ids	310,311,783,787,792,793,799,4059,4060,27939,55088
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4UD-pFRYy_3cFrR9nWrT0SAgEFQgIk3Eh_vCUEMwhsHvzrXbuBP-LB27o0adY0_d57e9_3IfTMfEGIijnWUURw4AcUMwUc59AQg5IclBUwHQzD7jR4mdFZqbNtuTAAYJvPwDWP9l--XqnMlMrqRpurwQ7RETVhRUHW2hdUjJacNc4tr2EWWeusHJ9MfhRwm7Mzc2FzPyild3Zjb0enIbzeazXH45GlsLjFgj-MVyzudM4KQvfWyhWadpOlm6XSVR-_xBz_-UnnqPZF8HNGe-i6QAeQVNHpN23CS9RuZnqxwu-LbSbeHKMqkaQ4XeHtGsTS0ZDaHq7EyYNexxr9YVUaRNjJm4IwUUPTTnvS6uLScwEv8lAmxYITP9aholQ2PMEE177yJFGUyZgKqSloHTEugTeASAEsx3dP04aiBIJ8ln-FKskqgWvkcCKAM8kCGsk8i9RSgx9LFsdRKJWnwxtUNfsxXxeyGvNiK27_fPuEjruTQX_e7w1f79BJwYc3dZB7VEk3GTzkkUEqH-2J-AQDqLIV
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2000+IEEE+International+Conference+on+Acoustics%2C+Speech%2C+and+Signal+Processing.+Proceedings+%28Cat.+No.00CH37100%29&rft.atitle=Audio-visual+intent-to-speak+detection+for+human-computer+interaction&rft.au=De+Cuetos%2C+P.&rft.au=Neti%2C+C.&rft.au=Senior%2C+A.W.&rft.date=2000-01-01&rft.pub=IEEE&rft.isbn=9780780362932&rft.issn=1520-6149&rft.eissn=2379-190X&rft.volume=4&rft.spage=2373&rft.epage=2376+vol.4&rft_id=info:doi/10.1109%2FICASSP.2000.859318&rft.externalDocID=859318
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-6149&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-6149&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-6149&client=summon