Direction of Arrival With One Microphone, a Few LEGOs, and Non-Negative Matrix Factorization
Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the vari...
Saved in:
Published in | IEEE/ACM transactions on audio, speech, and language processing Vol. 26; no. 12; pp. 2436 - 2446 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Piscataway
IEEE
01.12.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Abstract | Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the various sources. We take inspiration from this human ability to propose algorithms for accurate sound source localization using a single microphone embedded in an arbitrary scattering structure. The structure modifies the frequency response of the microphone in a direction-dependent way giving each direction a signature. While knowing those signatures is sufficient to localize sources of white noise, localizing speech is much more challenging: it is an ill-posed inverse problem, which we regularize by prior knowledge in the form of learned non-negative dictionaries. We demonstrate a monaural speech localization algorithm based on non-negative matrix factorization that does not depend on sophisticated, designed scatterers. In fact, we show experimental results with ad hoc scatterers made of LEGO bricks. Even with these rudimentary structures we can accurately localize arbitrary speakers; that is, we do not need to learn the dictionary for the particular speaker to be localized. Finally, we discuss multi-source localization and the related limitations of our approach. |
---|---|
AbstractList | Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the various sources. We take inspiration from this human ability to propose algorithms for accurate sound source localization using a single microphone embedded in an arbitrary scattering structure. The structure modifies the frequency response of the microphone in a direction-dependent way giving each direction a signature. While knowing those signatures is sufficient to localize sources of white noise, localizing speech is much more challenging: it is an ill-posed inverse problem, which we regularize by prior knowledge in the form of learned non-negative dictionaries. We demonstrate a monaural speech localization algorithm based on non-negative matrix factorization that does not depend on sophisticated, designed scatterers. In fact, we show experimental results with ad hoc scatterers made of LEGO bricks. Even with these rudimentary structures we can accurately localize arbitrary speakers; that is, we do not need to learn the dictionary for the particular speaker to be localized. Finally, we discuss multi-source localization and the related limitations of our approach. |
Author | Dokmanic, Ivan El Badawy, Dalia |
Author_xml | – sequence: 1 givenname: Dalia surname: El Badawy fullname: El Badawy, Dalia email: dalia.elbadawy@epfl.ch organization: Ecole Polytech. Fed. de Lausanne, Lausanne, Switzerland – sequence: 2 givenname: Ivan surname: Dokmanic fullname: Dokmanic, Ivan email: dokmanic@illinois.edu organization: Dept. of Electr. & Comput. Eng., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA |
BookMark | eNo9kM1OAjEUhRuDiYi8gG6auHWw7XQ67ZIgoMkIJmLcmDSl00oJTrEz4M_TWwRd3XuTc87N-U5Bq_KVAeAcox7GSFzP-o_FQ48gzHuEsxxxfATaJCUiESmirb-dCHQCunW9RAhhlAuR0zZ4uXHB6Mb5CnoL-yG4rVrBZ9cs4LQy8N7p4NeL-O8KKjgyH7AYjqd1PKoSTnyVTMyratw2KlUT3CccKd344L7VLvIMHFu1qk33MDvgaTScDW6TYjq-G_SLRKeUNknJsswqrUuTqnnOSyoUtsxqk1NurKGMUcRzm5ekNCWezxHXmBPNbCxPVabTDrjc566Df9-YupFLvwlVfCnJrinBIuVRRfaqWKmug7FyHdybCl8SI7kDKX9Byh1IeQAZTRd7kzPG_Bs4pRnLWPoDrhdwhQ |
CODEN | ITASD8 |
CitedBy_id | crossref_primary_10_1016_j_jsv_2023_117671 crossref_primary_10_1002_advs_201902271 crossref_primary_10_3390_s23020769 crossref_primary_10_3389_frsip_2024_1341087 crossref_primary_10_3389_fphy_2022_1024964 |
Cites_doi | 10.1121/1.1903351 10.1016/0378-5955(92)90123-5 10.1121/1.390770 10.1002/cpa.20132 10.1109/TASL.2006.885253 10.1109/TASL.2010.2050089 10.1007/978-1-4614-4942-3 10.1109/TASL.2006.876726 10.1162/neco.2008.04-08-771 10.1109/TASL.2013.2270369 10.1121/1.1811412 10.1159/000380745 10.1109/TSP.2005.850882 10.1073/pnas.1502276112 10.1137/S0036144500367337 10.1117/12.893870 10.1068/p150067 10.1023/A:1008350127376 10.1162/NECO_a_00168 10.1109/TASL.2012.2183869 10.1109/89.841214 10.1038/44565 |
ContentType | Journal Article |
Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018 |
Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018 |
DBID | 97E RIA RIE AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D |
DOI | 10.1109/TASLP.2018.2867081 |
DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE/IET Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional |
DatabaseTitle | CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional |
DatabaseTitleList | Computer and Information Systems Abstracts |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Electronic Library Online url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Engineering |
EISSN | 2329-9304 |
EndPage | 2446 |
ExternalDocumentID | 10_1109_TASLP_2018_2867081 8445656 |
Genre | orig-research |
GrantInformation_xml | – fundername: Swiss National Science Foundation grantid: 20FP-1 151073 |
GroupedDBID | 0R~ 4.4 6IK 97E AAJGR AAKMM AALFJ AASAJ AAWTV ABQJQ ABVLG ACIWK ACM ADBCU ADPZR AEBYY AENSD AFWIH AFWXC AIKLT AKJIK ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CCLIF EBS EJD GUFHI HGAVV IFIPE IPLJI JAVBF LHSKQ M43 OCL PQQKQ RIA RIE RNS ROL AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D |
ID | FETCH-LOGICAL-c344t-d655faccde3ab78d49a1f6fce748efe4664087f7d2ded1bb08c182c6f1094a5c3 |
IEDL.DBID | RIE |
ISSN | 2329-9290 |
IngestDate | Thu Oct 10 17:52:54 EDT 2024 Fri Aug 23 00:55:35 EDT 2024 Wed Jun 26 19:28:15 EDT 2024 |
IsDoiOpenAccess | false |
IsOpenAccess | true |
IsPeerReviewed | true |
IsScholarly | true |
Issue | 12 |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c344t-d655faccde3ab78d49a1f6fce748efe4664087f7d2ded1bb08c182c6f1094a5c3 |
ORCID | 0000-0002-0931-1601 0000-0001-7132-5214 |
PQID | 2107921938 |
PQPubID | 85426 |
PageCount | 11 |
ParticipantIDs | proquest_journals_2107921938 crossref_primary_10_1109_TASLP_2018_2867081 ieee_primary_8445656 |
PublicationCentury | 2000 |
PublicationDate | 2018-12-01 |
PublicationDateYYYYMMDD | 2018-12-01 |
PublicationDate_xml | – month: 12 year: 2018 text: 2018-12-01 day: 01 |
PublicationDecade | 2010 |
PublicationPlace | Piscataway |
PublicationPlace_xml | – name: Piscataway |
PublicationTitle | IEEE/ACM transactions on audio, speech, and language processing |
PublicationTitleAbbrev | TASLP |
PublicationYear | 2018 |
Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
References | kitamura (ref42) 0 colton (ref37) 2013 ref34 ref12 ref31 roux (ref19) 2015 ref30 ref32 garofolo (ref40) 1993 ref2 smaragdis (ref24) 0 ref38 ref16 lee (ref15) 1999; 401 sun (ref17) 0 saxena (ref8) 0 wierstorf (ref39) 2011 ref23 ref26 ref20 ref41 ref21 cichocki (ref36) 0; 5 schmidt (ref18) 0 friedman (ref35) 2010 dokmani? (ref11) 0 dikmen (ref22) 0 algazi (ref28) 0 ref27 badawy (ref9) 0 ledoux (ref29) 2001 lefèvre (ref33) 0 ref7 blauert (ref1) 1997 ref4 ref3 ref6 ref5 dokmani? (ref10) 2015 boufounos (ref13) 0; 8138 traa (ref25) 0 langville (ref43) 2014 cagli (ref14) 0 |
References_xml | – ident: ref30 doi: 10.1121/1.1903351 – ident: ref3 doi: 10.1016/0378-5955(92)90123-5 – ident: ref2 doi: 10.1121/1.390770 – ident: ref34 doi: 10.1002/cpa.20132 – start-page: 177 year: 0 ident: ref24 article-title: Non-negative matrix factorization for polyphonic music transcription publication-title: Proc IEEE Workshop Appl Signal Process Audio Acoust contributor: fullname: smaragdis – ident: ref20 doi: 10.1109/TASL.2006.885253 – start-page: 489 year: 0 ident: ref9 article-title: Acoustic DoA estimation by one unsophisticated sensor publication-title: Proc Int Conf Latent Variable Anal Signal Separat contributor: fullname: badawy – ident: ref26 doi: 10.1109/TASL.2010.2050089 – year: 2013 ident: ref37 publication-title: Inverse Acoustic and Electromagnetic Scattering Theory doi: 10.1007/978-1-4614-4942-3 contributor: fullname: colton – year: 2001 ident: ref29 publication-title: The Concentration of Measure Phenomenon contributor: fullname: ledoux – ident: ref21 doi: 10.1109/TASL.2006.876726 – ident: ref16 doi: 10.1162/neco.2008.04-08-771 – ident: ref23 doi: 10.1109/TASL.2013.2270369 – year: 2010 ident: ref35 article-title: A note on the group lasso and a sparse group lasso contributor: fullname: friedman – ident: ref4 doi: 10.1121/1.1811412 – start-page: 141 year: 0 ident: ref17 article-title: Universal speech models for speaker independent single channel source separation publication-title: Proc IEEE Int Conf Audio Speech Signal Process contributor: fullname: sun – start-page: 2614 year: 0 ident: ref18 article-title: Single-channel speech separation using sparse non-negative matrix factorization publication-title: Proc INTERSPEECH contributor: fullname: schmidt – ident: ref31 doi: 10.1159/000380745 – ident: ref12 doi: 10.1109/TSP.2005.850882 – ident: ref7 doi: 10.1073/pnas.1502276112 – ident: ref38 doi: 10.1137/S0036144500367337 – volume: 8138 start-page: 81380k year: 0 ident: ref13 article-title: Joint sparsity models for wideband array processing publication-title: Proc SPIE doi: 10.1117/12.893870 contributor: fullname: boufounos – ident: ref5 doi: 10.1068/p150067 – start-page: 1 year: 0 ident: ref25 article-title: Directional NMF for joint source localization and separation publication-title: Proc IEEE Workshop Appl Signal Process Audio Acoust contributor: fullname: traa – ident: ref6 doi: 10.1023/A:1008350127376 – start-page: 93 year: 0 ident: ref22 article-title: Unsupervised single-channel source separation using bayesian NMF publication-title: Proc IEEE Workshop Appl Signal Process Audio Acoust contributor: fullname: dikmen – ident: ref32 doi: 10.1162/NECO_a_00168 – start-page: 1737 year: 0 ident: ref8 article-title: Learning sound location from a single microphone publication-title: Proc IEEE Int Conf Robot Autom contributor: fullname: saxena – year: 1997 ident: ref1 publication-title: Spatial Hearing The Psychophysics of Human Sound Localization contributor: fullname: blauert – year: 1993 ident: ref40 article-title: DARPA TIMIT: Acoustic-phonetic continuous speech corpus contributor: fullname: garofolo – ident: ref41 doi: 10.1109/TASL.2012.2183869 – year: 2011 ident: ref39 article-title: A free database of head-related impulse response measurements in the horizontal plane with multiple distances publication-title: Audio Engineering Society Convention 130 contributor: fullname: wierstorf – year: 2014 ident: ref43 article-title: Algorithms, initializations, and convergence for the nonnegative matrix factorization contributor: fullname: langville – year: 2015 ident: ref10 article-title: Listening to distances and hearing shapes: Inverse problems in room acoustics and beyond contributor: fullname: dokmani? – ident: ref27 doi: 10.1109/89.841214 – start-page: 99 year: 0 ident: ref28 article-title: The CIPIC HRTF database publication-title: Proc IEEE Workshop Appl Signal Process Audio Acoust contributor: fullname: algazi – start-page: 1 year: 0 ident: ref42 article-title: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis publication-title: IEEE Int Workshop Acoust Signal Enhancement contributor: fullname: kitamura – start-page: 21 year: 0 ident: ref33 article-title: Itakura-Saito non-negative matrix factorization with group sparsity publication-title: Proc IEEE Int Conf Audio Speech Signal Process contributor: fullname: lefèvre – start-page: 2617 year: 0 ident: ref11 article-title: Room Helps: Acoustic localization with finite elements publication-title: Proc IEEE Int Conf Audio Speech Signal Process contributor: fullname: dokmani? – start-page: 1 year: 0 ident: ref14 article-title: Robust DOA estimation of speech signals via sparsity models using microphone arrays publication-title: Proc IEEE Workshop Appl Signal Process Audio Acoust contributor: fullname: cagli – volume: 401 start-page: 788 year: 1999 ident: ref15 article-title: Learning the parts of objects by non-negative matrix factorization publication-title: Nature doi: 10.1038/44565 contributor: fullname: lee – volume: 5 start-page: 621v year: 0 ident: ref36 article-title: New algorithms for non-negative matrix factorization in applications to blind source separation publication-title: Proc IEEE Int Conf Audio Speech Signal Process contributor: fullname: cichocki – year: 2015 ident: ref19 article-title: Sparse NMF - Half-baked or Well Done? contributor: fullname: roux |
SSID | ssj0001079974 |
Score | 2.2054214 |
Snippet | Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also... |
SourceID | proquest crossref ieee |
SourceType | Aggregation Database Publisher |
StartPage | 2436 |
SubjectTerms | Acoustics Algorithms Direction of arrival Direction-of-arrival estimation Factorization Frequency response group sparsity Human performance Ill posed problems Inverse problems Microphones monaural localization non-negative matrix factorization Scattering Sound localization sound scattering Speech processing universal speech model White noise |
Title | Direction of Arrival With One Microphone, a Few LEGOs, and Non-Negative Matrix Factorization |
URI | https://ieeexplore.ieee.org/document/8445656 https://www.proquest.com/docview/2107921938 |
Volume | 26 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEJ4AJz34QiOKZg_epFDott0eiQGJ4WEiRA4mTbs7q8SkGCjR-Ovd3RZ8Hry1zbbdzMzOY3fmG4ALybWho2jRwJYW9exALynXQpcrA9uUcWy2BgZDrzehN1N3WoDaphYGEU3yGdb1pTnLF3O-0ltlDUaN_1GEoh8EWa3W536KrZ4Z0GXlI6gftgJ7XSNjB41x-65_qxO5WL3FPN9mzW92yDRW-aWNjYnp7sJgPbkss-S5vkrjOn__gdv439nvwU7ua5J2Jhz7UMDkALa_IBCW4SFXefOEzKUauZgpySP3s_SJjBIkA52up7PXsUYi0sVX0u9cj5bqJhFkOE-sIT4a5HAy0FD_b6Rr2vfktZ2HMOl2xlc9K2-4YHGH0tQSnuvKiHOBThT7TNAgakpPcvQpQ4kaid5mvvRFS6BoKjYyrsIT7klFYhq53DmCUqKmdAxEGTmPCuQ-i1wqHJdFUqrILsI41iGRrMDlmvzhS4arEZp4xA5Cw6xQMyvMmVWBsqbnZmROygpU1xwL86W3DFtaEpQedtjJ32-dwpb-dpaTUoVSuljhmfIs0vjciNQHNTrJ4Q |
link.rule.ids | 315,783,787,799,27937,27938,55087 |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDLZgHIADb8RgQA7coFu7pm16nNDGgHUgMcQOSFWbODAhdQg6gfj1JGnH-8CtlVLFsh0_UvszwIHk2tFRtGhoS4v6dqiPlGehx5WDdWSamquBqO93r-nZ0BvOwNFHLwwimuIzrOtH8y9fjPlEX5U1GDXxxyzMqbiaBUW31ueNih2EoYFdVlGC2rIZ2tMuGTtsDFpXvUtdysXqTeYHNnO-eSIzWuWXPTZOprMM0ZS8orbkoT7J0zp_-4Hc-F_6V2CpjDZJq1CPVZjBbA0Wv2AQrsNtafTGGRlLtfJppHSP3Izye3KRIYl0wZ6uX8cjkpAOvpBe--TiWb1kgvTHmdXHO4MdTiIN9v9KOmaAT9nduQHXnfbguGuVIxcs7lKaW8L3PJlwLtBN0oAJGiaO9CXHgDKUqLHoFeNlIJoChaMEybhKULgvFYtp4nF3EyqZImkLiHJzPhXIA5Z4VLgeS6RUuV2CaaqTIlmFwyn748cCWSM2GYkdxkZYsRZWXAqrCuuanx8rS1ZWoTaVWFwevue4qTVBWWKXbf_91T7MdwdRL-6d9s93YEHvU1So1KCSP01wV8UZebpn1Osd8ILNLQ |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Direction+of+Arrival+With+One+Microphone%2C+a+Few+LEGOs%2C+and+Non-Negative+Matrix+Factorization&rft.jtitle=IEEE%2FACM+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=El+Badawy%2C+Dalia&rft.au=Dokmanic%2C+Ivan&rft.date=2018-12-01&rft.issn=2329-9290&rft.eissn=2329-9304&rft.volume=26&rft.issue=12&rft.spage=2436&rft.epage=2446&rft_id=info:doi/10.1109%2FTASLP.2018.2867081&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TASLP_2018_2867081 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2329-9290&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2329-9290&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2329-9290&client=summon |