Audio-visual intent-to-speak detection for human-computer interaction
Introduces a practical system that aims to detect a user's intent to speak to a computer, by considering both audio and visual cues. The whole system is designed to intuitively turn on the microphone for speech recognition without needing to click on a mouse, thus improving the human-like commu...
Saved in:
Published in | 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100) Vol. 4; pp. 2373 - 2376 vol.4 |
---|---|
Main Authors | , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
2000
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Introduces a practical system that aims to detect a user's intent to speak to a computer, by considering both audio and visual cues. The whole system is designed to intuitively turn on the microphone for speech recognition without needing to click on a mouse, thus improving the human-like communication between users and computers. The first step is to detect a frontal face through a simple desktop video camera image, by using some well-known image processing techniques for face and facial feature detection on one image. The second step is an audio-visual speech event detection that combines both visual and audio indications of speech. In this paper, we consider visual measures of speech activity as well as audio energy to determine if the previously detected user is actually speaking or not. |
---|---|
AbstractList | Introduces a practical system that aims to detect a user's intent to speak to a computer, by considering both audio and visual cues. The whole system is designed to intuitively turn on the microphone for speech recognition without needing to click on a mouse, thus improving the human-like communication between users and computers. The first step is to detect a frontal face through a simple desktop video camera image, by using some well-known image processing techniques for face and facial feature detection on one image. The second step is an audio-visual speech event detection that combines both visual and audio indications of speech. In this paper, we consider visual measures of speech activity as well as audio energy to determine if the previously detected user is actually speaking or not. |
Author | De Cuetos, P. Neti, C. Senior, A.W. |
Author_xml | – sequence: 1 givenname: P. surname: De Cuetos fullname: De Cuetos, P. organization: Inst. Eurecom, Sophia-Antipolis, France – sequence: 2 givenname: C. surname: Neti fullname: Neti, C. – sequence: 3 givenname: A.W. surname: Senior fullname: Senior, A.W. |
BookMark | eNotUFtLwzAYDTrBbu4P7Kl_IDNf0rTJ4xjzAgOFKfg2cvmK0TUtbSr47y2bcOA8nAuHMyez2EYkZAVsDcD0_fN2czi8rjljbK2kFqCuSMZFpSlo9nFNlrpSbIIouRZ8RjKQnNESCn1L5sPwNeVUVaiM7DajDy39CcNoTnmICWOiqaVDh-Y795jQpdDGvG77_HNsTKSubboxYX829-Ys35Gb2pwGXP7zgrw_7N62T3T_8jhN3dMAFU_UaCZqXzopLXCjjPbCccucVLaWxnqJ3ldKW9SAzBpUZQncS3CSYTG5xIKsLr0BEY9dHxrT_x4vB4g_m2RQ7g |
ContentType | Conference Proceeding |
DBID | 6IE 6IH CBEJK RIE RIO |
DOI | 10.1109/ICASSP.2000.859318 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Proceedings Order Plan (POP) 1998-present by volume IEEE Xplore All Conference Proceedings IEEE Xplore IEEE Proceedings Order Plans (POP) 1998-present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore Digital Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 2379-190X |
EndPage | 2376 vol.4 |
ExternalDocumentID | 859318 |
GroupedDBID | 23M 29P 6IE 6IF 6IH 6IK 6IL 6IM 6IN AAJGR ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK IJVOP IPLJI JC5 M43 OCL RIE RIL RIO RNS |
ID | FETCH-LOGICAL-i172t-a903fd6c55b12a8a9d3c2b0c58bf5abd5edd789be91e0bae86612d51c50e4c583 |
IEDL.DBID | RIE |
ISBN | 9780780362932 0780362934 |
ISSN | 1520-6149 |
IngestDate | Wed Jun 26 19:23:09 EDT 2024 |
IsPeerReviewed | false |
IsScholarly | true |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i172t-a903fd6c55b12a8a9d3c2b0c58bf5abd5edd789be91e0bae86612d51c50e4c583 |
ParticipantIDs | ieee_primary_859318 |
PublicationCentury | 2000 |
PublicationDate | 20000000 |
PublicationDateYYYYMMDD | 2000-01-01 |
PublicationDate_xml | – year: 2000 text: 20000000 |
PublicationDecade | 2000 |
PublicationTitle | 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100) |
PublicationTitleAbbrev | ICASSP |
PublicationYear | 2000 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0008748 ssj0000454833 |
Score | 1.5669456 |
Snippet | Introduces a practical system that aims to detect a user's intent to speak to a computer, by considering both audio and visual cues. The whole system is... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 2373 |
SubjectTerms | Face detection Humans Keyboards Mice Mouth Shape Speech recognition Text processing USA Councils |
Title | Audio-visual intent-to-speak detection for human-computer interaction |
URI | https://ieeexplore.ieee.org/document/859318 |
Volume | 4 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JasMwEBVNTu2lbZrSHR96lSMvsqVjCAlpoSWQBnILWsYQUuyQ2Dn06yvJTrrQQ2-WEQgLoTcznvceQo8sEoSojGOdpgTHUUwxU8CxgYYMlOSgnIDpy2synsXPczpvdLYdFwYAXPMZ-PbR_cvXhapsqaxntbkC1kKtlPOaqnUop1glOWeb21zCLHXGWQadbHYUc5exM3td8yhuhHf243BPpiG89zToT6cTR2Dx6-V-2K441Bmd1nTurRMrtM0mK78qpa8-fkk5_vODzlD3i97nTQ7AdY6OIO-gk2_KhBdo2K_0ssC75bYS757VlMhLXBZ4uwax8jSUroMr90zI6zmbP6waewg3eVPTJbpoNhq-Dca4cVzASxPIlFhwEmU6UZTKIBRMcB2pUBJFmcyokJqC1injEngARApgBt1DTQNFCcRmVnSJ2nmRwxXyOBHAmWQxTaXJIbXUEGWSZVmaSBXq5Bp17H4s1rWoxqLeips_396i45oBbysfd6hdbiq4N7FAKR_cKfgEp4euyA |
link.rule.ids | 310,311,783,787,792,793,799,4059,4060,27939,55088 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4UD-pFRYy_3cFrR9nWrT0SAgEFQgIk3Eh_vCUEMwhsHvzrXbuBP-LB27o0adY0_d57e9_3IfTMfEGIijnWUURw4AcUMwUc59AQg5IclBUwHQzD7jR4mdFZqbNtuTAAYJvPwDWP9l--XqnMlMrqRpurwQ7RETVhRUHW2hdUjJacNc4tr2EWWeusHJ9MfhRwm7Mzc2FzPyild3Zjb0enIbzeazXH45GlsLjFgj-MVyzudM4KQvfWyhWadpOlm6XSVR-_xBz_-UnnqPZF8HNGe-i6QAeQVNHpN23CS9RuZnqxwu-LbSbeHKMqkaQ4XeHtGsTS0ZDaHq7EyYNexxr9YVUaRNjJm4IwUUPTTnvS6uLScwEv8lAmxYITP9aholQ2PMEE177yJFGUyZgKqSloHTEugTeASAEsx3dP04aiBIJ8ln-FKskqgWvkcCKAM8kCGsk8i9RSgx9LFsdRKJWnwxtUNfsxXxeyGvNiK27_fPuEjruTQX_e7w1f79BJwYc3dZB7VEk3GTzkkUEqH-2J-AQDqLIV |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2000+IEEE+International+Conference+on+Acoustics%2C+Speech%2C+and+Signal+Processing.+Proceedings+%28Cat.+No.00CH37100%29&rft.atitle=Audio-visual+intent-to-speak+detection+for+human-computer+interaction&rft.au=De+Cuetos%2C+P.&rft.au=Neti%2C+C.&rft.au=Senior%2C+A.W.&rft.date=2000-01-01&rft.pub=IEEE&rft.isbn=9780780362932&rft.issn=1520-6149&rft.eissn=2379-190X&rft.volume=4&rft.spage=2373&rft.epage=2376+vol.4&rft_id=info:doi/10.1109%2FICASSP.2000.859318&rft.externalDocID=859318 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-6149&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-6149&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-6149&client=summon |