Audio-visual intent-to-speak detection for human-computer interaction

Introduces a practical system that aims to detect a user's intent to speak to a computer, by considering both audio and visual cues. The whole system is designed to intuitively turn on the microphone for speech recognition without needing to click on a mouse, thus improving the human-like commu...

Full description

Saved in:
Bibliographic Details
Published in2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100) Vol. 4; pp. 2373 - 2376 vol.4
Main Authors De Cuetos, P., Neti, C., Senior, A.W.
Format Conference Proceeding
LanguageEnglish
Published IEEE 2000
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Introduces a practical system that aims to detect a user's intent to speak to a computer, by considering both audio and visual cues. The whole system is designed to intuitively turn on the microphone for speech recognition without needing to click on a mouse, thus improving the human-like communication between users and computers. The first step is to detect a frontal face through a simple desktop video camera image, by using some well-known image processing techniques for face and facial feature detection on one image. The second step is an audio-visual speech event detection that combines both visual and audio indications of speech. In this paper, we consider visual measures of speech activity as well as audio energy to determine if the previously detected user is actually speaking or not.
AbstractList Introduces a practical system that aims to detect a user's intent to speak to a computer, by considering both audio and visual cues. The whole system is designed to intuitively turn on the microphone for speech recognition without needing to click on a mouse, thus improving the human-like communication between users and computers. The first step is to detect a frontal face through a simple desktop video camera image, by using some well-known image processing techniques for face and facial feature detection on one image. The second step is an audio-visual speech event detection that combines both visual and audio indications of speech. In this paper, we consider visual measures of speech activity as well as audio energy to determine if the previously detected user is actually speaking or not.
Author De Cuetos, P.
Neti, C.
Senior, A.W.
Author_xml – sequence: 1
  givenname: P.
  surname: De Cuetos
  fullname: De Cuetos, P.
  organization: Inst. Eurecom, Sophia-Antipolis, France
– sequence: 2
  givenname: C.
  surname: Neti
  fullname: Neti, C.
– sequence: 3
  givenname: A.W.
  surname: Senior
  fullname: Senior, A.W.
BookMark eNotUFtLwzAYDTrBbu4P7Kl_IDNf0rTJ4xjzAgOFKfg2cvmK0TUtbSr47y2bcOA8nAuHMyez2EYkZAVsDcD0_fN2czi8rjljbK2kFqCuSMZFpSlo9nFNlrpSbIIouRZ8RjKQnNESCn1L5sPwNeVUVaiM7DajDy39CcNoTnmICWOiqaVDh-Y795jQpdDGvG77_HNsTKSubboxYX829-Ys35Gb2pwGXP7zgrw_7N62T3T_8jhN3dMAFU_UaCZqXzopLXCjjPbCccucVLaWxnqJ3ldKW9SAzBpUZQncS3CSYTG5xIKsLr0BEY9dHxrT_x4vB4g_m2RQ7g
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICASSP.2000.859318
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Xplore
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore Digital Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2379-190X
EndPage 2376 vol.4
ExternalDocumentID 859318
GroupedDBID 23M
29P
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
JC5
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i172t-a903fd6c55b12a8a9d3c2b0c58bf5abd5edd789be91e0bae86612d51c50e4c583
IEDL.DBID RIE
ISBN 9780780362932
0780362934
ISSN 1520-6149
IngestDate Wed Jun 26 19:23:09 EDT 2024
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i172t-a903fd6c55b12a8a9d3c2b0c58bf5abd5edd789be91e0bae86612d51c50e4c583
ParticipantIDs ieee_primary_859318
PublicationCentury 2000
PublicationDate 20000000
PublicationDateYYYYMMDD 2000-01-01
PublicationDate_xml – year: 2000
  text: 20000000
PublicationDecade 2000
PublicationTitle 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100)
PublicationTitleAbbrev ICASSP
PublicationYear 2000
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0008748
ssj0000454833
Score 1.5669456
Snippet Introduces a practical system that aims to detect a user's intent to speak to a computer, by considering both audio and visual cues. The whole system is...
SourceID ieee
SourceType Publisher
StartPage 2373
SubjectTerms Face detection
Humans
Keyboards
Mice
Mouth
Shape
Speech recognition
Text processing
USA Councils
Title Audio-visual intent-to-speak detection for human-computer interaction
URI https://ieeexplore.ieee.org/document/859318
Volume 4
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3JasMwEBVNTu2lbZrSHR96lSMvsqVjCAlpoSWQBnILWsYQUuyQ2Dn06yvJTrrQQ2-WEQgLoTcznvceQo8sEoSojGOdpgTHUUwxU8CxgYYMlOSgnIDpy2synsXPczpvdLYdFwYAXPMZ-PbR_cvXhapsqaxntbkC1kKtlPOaqnUop1glOWeb21zCLHXGWQadbHYUc5exM3td8yhuhHf243BPpiG89zToT6cTR2Dx6-V-2K441Bmd1nTurRMrtM0mK78qpa8-fkk5_vODzlD3i97nTQ7AdY6OIO-gk2_KhBdo2K_0ssC75bYS757VlMhLXBZ4uwax8jSUroMr90zI6zmbP6waewg3eVPTJbpoNhq-Dca4cVzASxPIlFhwEmU6UZTKIBRMcB2pUBJFmcyokJqC1injEngARApgBt1DTQNFCcRmVnSJ2nmRwxXyOBHAmWQxTaXJIbXUEGWSZVmaSBXq5Bp17H4s1rWoxqLeips_396i45oBbysfd6hdbiq4N7FAKR_cKfgEp4euyA
link.rule.ids 310,311,783,787,792,793,799,4059,4060,27939,55088
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4UD-pFRYy_3cFrR9nWrT0SAgEFQgIk3Eh_vCUEMwhsHvzrXbuBP-LB27o0adY0_d57e9_3IfTMfEGIijnWUURw4AcUMwUc59AQg5IclBUwHQzD7jR4mdFZqbNtuTAAYJvPwDWP9l--XqnMlMrqRpurwQ7RETVhRUHW2hdUjJacNc4tr2EWWeusHJ9MfhRwm7Mzc2FzPyild3Zjb0enIbzeazXH45GlsLjFgj-MVyzudM4KQvfWyhWadpOlm6XSVR-_xBz_-UnnqPZF8HNGe-i6QAeQVNHpN23CS9RuZnqxwu-LbSbeHKMqkaQ4XeHtGsTS0ZDaHq7EyYNexxr9YVUaRNjJm4IwUUPTTnvS6uLScwEv8lAmxYITP9aholQ2PMEE177yJFGUyZgKqSloHTEugTeASAEsx3dP04aiBIJ8ln-FKskqgWvkcCKAM8kCGsk8i9RSgx9LFsdRKJWnwxtUNfsxXxeyGvNiK27_fPuEjruTQX_e7w1f79BJwYc3dZB7VEk3GTzkkUEqH-2J-AQDqLIV
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2000+IEEE+International+Conference+on+Acoustics%2C+Speech%2C+and+Signal+Processing.+Proceedings+%28Cat.+No.00CH37100%29&rft.atitle=Audio-visual+intent-to-speak+detection+for+human-computer+interaction&rft.au=De+Cuetos%2C+P.&rft.au=Neti%2C+C.&rft.au=Senior%2C+A.W.&rft.date=2000-01-01&rft.pub=IEEE&rft.isbn=9780780362932&rft.issn=1520-6149&rft.eissn=2379-190X&rft.volume=4&rft.spage=2373&rft.epage=2376+vol.4&rft_id=info:doi/10.1109%2FICASSP.2000.859318&rft.externalDocID=859318
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-6149&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-6149&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-6149&client=summon