Direction of Arrival With One Microphone, a Few LEGOs, and Non-Negative Matrix Factorization

Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the vari...

Full description

Saved in:

Bibliographic Details
Published in	IEEE/ACM transactions on audio, speech, and language processing Vol. 26; no. 12; pp. 2436 - 2446
Main Authors	El Badawy, Dalia, Dokmanic, Ivan
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.12.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Acoustics Algorithms Direction of arrival Direction-of-arrival estimation Factorization Frequency response group sparsity Human performance Ill posed problems Inverse problems Microphones monaural localization non-negative matrix factorization Scattering Sound localization sound scattering Speech processing universal speech model White noise
Online Access	Get full text

Cover

Loading…

Abstract	Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the various sources. We take inspiration from this human ability to propose algorithms for accurate sound source localization using a single microphone embedded in an arbitrary scattering structure. The structure modifies the frequency response of the microphone in a direction-dependent way giving each direction a signature. While knowing those signatures is sufficient to localize sources of white noise, localizing speech is much more challenging: it is an ill-posed inverse problem, which we regularize by prior knowledge in the form of learned non-negative dictionaries. We demonstrate a monaural speech localization algorithm based on non-negative matrix factorization that does not depend on sophisticated, designed scatterers. In fact, we show experimental results with ad hoc scatterers made of LEGO bricks. Even with these rudimentary structures we can accurately localize arbitrary speakers; that is, we do not need to learn the dictionary for the particular speaker to be localized. Finally, we discuss multi-source localization and the related limitations of our approach.
AbstractList	Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the various sources. We take inspiration from this human ability to propose algorithms for accurate sound source localization using a single microphone embedded in an arbitrary scattering structure. The structure modifies the frequency response of the microphone in a direction-dependent way giving each direction a signature. While knowing those signatures is sufficient to localize sources of white noise, localizing speech is much more challenging: it is an ill-posed inverse problem, which we regularize by prior knowledge in the form of learned non-negative dictionaries. We demonstrate a monaural speech localization algorithm based on non-negative matrix factorization that does not depend on sophisticated, designed scatterers. In fact, we show experimental results with ad hoc scatterers made of LEGO bricks. Even with these rudimentary structures we can accurately localize arbitrary speakers; that is, we do not need to learn the dictionary for the particular speaker to be localized. Finally, we discuss multi-source localization and the related limitations of our approach.
Author	Dokmanic, Ivan El Badawy, Dalia
Author_xml	– sequence: 1 givenname: Dalia surname: El Badawy fullname: El Badawy, Dalia email: dalia.elbadawy@epfl.ch organization: Ecole Polytech. Fed. de Lausanne, Lausanne, Switzerland – sequence: 2 givenname: Ivan surname: Dokmanic fullname: Dokmanic, Ivan email: dokmanic@illinois.edu organization: Dept. of Electr. & Comput. Eng., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
BookMark	eNo9kM1OAjEUhRuDiYi8gG6auHWw7XQ67ZIgoMkIJmLcmDSl00oJTrEz4M_TWwRd3XuTc87N-U5Bq_KVAeAcox7GSFzP-o_FQ48gzHuEsxxxfATaJCUiESmirb-dCHQCunW9RAhhlAuR0zZ4uXHB6Mb5CnoL-yG4rVrBZ9cs4LQy8N7p4NeL-O8KKjgyH7AYjqd1PKoSTnyVTMyratw2KlUT3CccKd344L7VLvIMHFu1qk33MDvgaTScDW6TYjq-G_SLRKeUNknJsswqrUuTqnnOSyoUtsxqk1NurKGMUcRzm5ekNCWezxHXmBPNbCxPVabTDrjc566Df9-YupFLvwlVfCnJrinBIuVRRfaqWKmug7FyHdybCl8SI7kDKX9Byh1IeQAZTRd7kzPG_Bs4pRnLWPoDrhdwhQ
CODEN	ITASD8
CitedBy_id	crossref_primary_10_1016_j_jsv_2023_117671 crossref_primary_10_1002_advs_201902271 crossref_primary_10_3390_s23020769 crossref_primary_10_3389_frsip_2024_1341087 crossref_primary_10_3389_fphy_2022_1024964
Cites_doi	10.1121/1.1903351 10.1016/0378-5955(92)90123-5 10.1121/1.390770 10.1002/cpa.20132 10.1109/TASL.2006.885253 10.1109/TASL.2010.2050089 10.1007/978-1-4614-4942-3 10.1109/TASL.2006.876726 10.1162/neco.2008.04-08-771 10.1109/TASL.2013.2270369 10.1121/1.1811412 10.1159/000380745 10.1109/TSP.2005.850882 10.1073/pnas.1502276112 10.1137/S0036144500367337 10.1117/12.893870 10.1068/p150067 10.1023/A:1008350127376 10.1162/NECO_a_00168 10.1109/TASL.2012.2183869 10.1109/89.841214 10.1038/44565
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
DBID	97E RIA RIE AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D
DOI	10.1109/TASLP.2018.2867081
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005-present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE/IET Electronic Library (IEL) CrossRef Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional
DatabaseTitle	CrossRef Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional
DatabaseTitleList	Computer and Information Systems Abstracts
Database_xml	– sequence: 1 dbid: RIE name: IEEE Electronic Library Online url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering
EISSN	2329-9304
EndPage	2446
ExternalDocumentID	10_1109_TASLP_2018_2867081 8445656
Genre	orig-research
GrantInformation_xml	– fundername: Swiss National Science Foundation grantid: 20FP-1 151073
GroupedDBID	0R~ 4.4 6IK 97E AAJGR AAKMM AALFJ AASAJ AAWTV ABQJQ ABVLG ACIWK ACM ADBCU ADPZR AEBYY AENSD AFWIH AFWXC AIKLT AKJIK ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CCLIF EBS EJD GUFHI HGAVV IFIPE IPLJI JAVBF LHSKQ M43 OCL PQQKQ RIA RIE RNS ROL AAYXX CITATION 7SC 8FD JQ2 L7M L~C L~D
ID	FETCH-LOGICAL-c344t-d655faccde3ab78d49a1f6fce748efe4664087f7d2ded1bb08c182c6f1094a5c3
IEDL.DBID	RIE
ISSN	2329-9290
IngestDate	Thu Oct 10 17:52:54 EDT 2024 Fri Aug 23 00:55:35 EDT 2024 Wed Jun 26 19:28:15 EDT 2024
IsDoiOpenAccess	false
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Issue	12
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c344t-d655faccde3ab78d49a1f6fce748efe4664087f7d2ded1bb08c182c6f1094a5c3
ORCID	0000-0002-0931-1601 0000-0001-7132-5214
PQID	2107921938
PQPubID	85426
PageCount	11
ParticipantIDs	proquest_journals_2107921938 crossref_primary_10_1109_TASLP_2018_2867081 ieee_primary_8445656
PublicationCentury	2000
PublicationDate	2018-12-01
PublicationDateYYYYMMDD	2018-12-01
PublicationDate_xml	– month: 12 year: 2018 text: 2018-12-01 day: 01
PublicationDecade	2010
PublicationPlace	Piscataway
PublicationPlace_xml	– name: Piscataway
PublicationTitle	IEEE/ACM transactions on audio, speech, and language processing
PublicationTitleAbbrev	TASLP
PublicationYear	2018
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	kitamura (ref42) 0 colton (ref37) 2013 ref34 ref12 ref31 roux (ref19) 2015 ref30 ref32 garofolo (ref40) 1993 ref2 smaragdis (ref24) 0 ref38 ref16 lee (ref15) 1999; 401 sun (ref17) 0 saxena (ref8) 0 wierstorf (ref39) 2011 ref23 ref26 ref20 ref41 ref21 cichocki (ref36) 0; 5 schmidt (ref18) 0 friedman (ref35) 2010 dokmani? (ref11) 0 dikmen (ref22) 0 algazi (ref28) 0 ref27 badawy (ref9) 0 ledoux (ref29) 2001 lefèvre (ref33) 0 ref7 blauert (ref1) 1997 ref4 ref3 ref6 ref5 dokmani? (ref10) 2015 boufounos (ref13) 0; 8138 traa (ref25) 0 langville (ref43) 2014 cagli (ref14) 0
References_xml	– ident: ref30 doi: 10.1121/1.1903351 – ident: ref3 doi: 10.1016/0378-5955(92)90123-5 – ident: ref2 doi: 10.1121/1.390770 – ident: ref34 doi: 10.1002/cpa.20132 – start-page: 177 year: 0 ident: ref24 article-title: Non-negative matrix factorization for polyphonic music transcription publication-title: Proc IEEE Workshop Appl Signal Process Audio Acoust contributor: fullname: smaragdis – ident: ref20 doi: 10.1109/TASL.2006.885253 – start-page: 489 year: 0 ident: ref9 article-title: Acoustic DoA estimation by one unsophisticated sensor publication-title: Proc Int Conf Latent Variable Anal Signal Separat contributor: fullname: badawy – ident: ref26 doi: 10.1109/TASL.2010.2050089 – year: 2013 ident: ref37 publication-title: Inverse Acoustic and Electromagnetic Scattering Theory doi: 10.1007/978-1-4614-4942-3 contributor: fullname: colton – year: 2001 ident: ref29 publication-title: The Concentration of Measure Phenomenon contributor: fullname: ledoux – ident: ref21 doi: 10.1109/TASL.2006.876726 – ident: ref16 doi: 10.1162/neco.2008.04-08-771 – ident: ref23 doi: 10.1109/TASL.2013.2270369 – year: 2010 ident: ref35 article-title: A note on the group lasso and a sparse group lasso contributor: fullname: friedman – ident: ref4 doi: 10.1121/1.1811412 – start-page: 141 year: 0 ident: ref17 article-title: Universal speech models for speaker independent single channel source separation publication-title: Proc IEEE Int Conf Audio Speech Signal Process contributor: fullname: sun – start-page: 2614 year: 0 ident: ref18 article-title: Single-channel speech separation using sparse non-negative matrix factorization publication-title: Proc INTERSPEECH contributor: fullname: schmidt – ident: ref31 doi: 10.1159/000380745 – ident: ref12 doi: 10.1109/TSP.2005.850882 – ident: ref7 doi: 10.1073/pnas.1502276112 – ident: ref38 doi: 10.1137/S0036144500367337 – volume: 8138 start-page: 81380k year: 0 ident: ref13 article-title: Joint sparsity models for wideband array processing publication-title: Proc SPIE doi: 10.1117/12.893870 contributor: fullname: boufounos – ident: ref5 doi: 10.1068/p150067 – start-page: 1 year: 0 ident: ref25 article-title: Directional NMF for joint source localization and separation publication-title: Proc IEEE Workshop Appl Signal Process Audio Acoust contributor: fullname: traa – ident: ref6 doi: 10.1023/A:1008350127376 – start-page: 93 year: 0 ident: ref22 article-title: Unsupervised single-channel source separation using bayesian NMF publication-title: Proc IEEE Workshop Appl Signal Process Audio Acoust contributor: fullname: dikmen – ident: ref32 doi: 10.1162/NECO_a_00168 – start-page: 1737 year: 0 ident: ref8 article-title: Learning sound location from a single microphone publication-title: Proc IEEE Int Conf Robot Autom contributor: fullname: saxena – year: 1997 ident: ref1 publication-title: Spatial Hearing The Psychophysics of Human Sound Localization contributor: fullname: blauert – year: 1993 ident: ref40 article-title: DARPA TIMIT: Acoustic-phonetic continuous speech corpus contributor: fullname: garofolo – ident: ref41 doi: 10.1109/TASL.2012.2183869 – year: 2011 ident: ref39 article-title: A free database of head-related impulse response measurements in the horizontal plane with multiple distances publication-title: Audio Engineering Society Convention 130 contributor: fullname: wierstorf – year: 2014 ident: ref43 article-title: Algorithms, initializations, and convergence for the nonnegative matrix factorization contributor: fullname: langville – year: 2015 ident: ref10 article-title: Listening to distances and hearing shapes: Inverse problems in room acoustics and beyond contributor: fullname: dokmani? – ident: ref27 doi: 10.1109/89.841214 – start-page: 99 year: 0 ident: ref28 article-title: The CIPIC HRTF database publication-title: Proc IEEE Workshop Appl Signal Process Audio Acoust contributor: fullname: algazi – start-page: 1 year: 0 ident: ref42 article-title: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis publication-title: IEEE Int Workshop Acoust Signal Enhancement contributor: fullname: kitamura – start-page: 21 year: 0 ident: ref33 article-title: Itakura-Saito non-negative matrix factorization with group sparsity publication-title: Proc IEEE Int Conf Audio Speech Signal Process contributor: fullname: lefèvre – start-page: 2617 year: 0 ident: ref11 article-title: Room Helps: Acoustic localization with finite elements publication-title: Proc IEEE Int Conf Audio Speech Signal Process contributor: fullname: dokmani? – start-page: 1 year: 0 ident: ref14 article-title: Robust DOA estimation of speech signals via sparsity models using microphone arrays publication-title: Proc IEEE Workshop Appl Signal Process Audio Acoust contributor: fullname: cagli – volume: 401 start-page: 788 year: 1999 ident: ref15 article-title: Learning the parts of objects by non-negative matrix factorization publication-title: Nature doi: 10.1038/44565 contributor: fullname: lee – volume: 5 start-page: 621v year: 0 ident: ref36 article-title: New algorithms for non-negative matrix factorization in applications to blind source separation publication-title: Proc IEEE Int Conf Audio Speech Signal Process contributor: fullname: cichocki – year: 2015 ident: ref19 article-title: Sparse NMF - Half-baked or Well Done? contributor: fullname: roux
SSID	ssj0001079974
Score	2.2054214
Snippet	Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also...
SourceID	proquest crossref ieee
SourceType	Aggregation Database Publisher
StartPage	2436
SubjectTerms	Acoustics Algorithms Direction of arrival Direction-of-arrival estimation Factorization Frequency response group sparsity Human performance Ill posed problems Inverse problems Microphones monaural localization non-negative matrix factorization Scattering Sound localization sound scattering Speech processing universal speech model White noise
Title	Direction of Arrival With One Microphone, a Few LEGOs, and Non-Negative Matrix Factorization
URI	https://ieeexplore.ieee.org/document/8445656 https://www.proquest.com/docview/2107921938
Volume	26
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEJ4AJz34QiOKZg_epFDott0eiQGJ4WEiRA4mTbs7q8SkGCjR-Ovd3RZ8Hry1zbbdzMzOY3fmG4ALybWho2jRwJYW9exALynXQpcrA9uUcWy2BgZDrzehN1N3WoDaphYGEU3yGdb1pTnLF3O-0ltlDUaN_1GEoh8EWa3W536KrZ4Z0GXlI6gftgJ7XSNjB41x-65_qxO5WL3FPN9mzW92yDRW-aWNjYnp7sJgPbkss-S5vkrjOn__gdv439nvwU7ua5J2Jhz7UMDkALa_IBCW4SFXefOEzKUauZgpySP3s_SJjBIkA52up7PXsUYi0sVX0u9cj5bqJhFkOE-sIT4a5HAy0FD_b6Rr2vfktZ2HMOl2xlc9K2-4YHGH0tQSnuvKiHOBThT7TNAgakpPcvQpQ4kaid5mvvRFS6BoKjYyrsIT7klFYhq53DmCUqKmdAxEGTmPCuQ-i1wqHJdFUqrILsI41iGRrMDlmvzhS4arEZp4xA5Cw6xQMyvMmVWBsqbnZmROygpU1xwL86W3DFtaEpQedtjJ32-dwpb-dpaTUoVSuljhmfIs0vjciNQHNTrJ4Q
link.rule.ids	315,783,787,799,27937,27938,55087
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDLZgHIADb8RgQA7coFu7pm16nNDGgHUgMcQOSFWbODAhdQg6gfj1JGnH-8CtlVLFsh0_UvszwIHk2tFRtGhoS4v6dqiPlGehx5WDdWSamquBqO93r-nZ0BvOwNFHLwwimuIzrOtH8y9fjPlEX5U1GDXxxyzMqbiaBUW31ueNih2EoYFdVlGC2rIZ2tMuGTtsDFpXvUtdysXqTeYHNnO-eSIzWuWXPTZOprMM0ZS8orbkoT7J0zp_-4Hc-F_6V2CpjDZJq1CPVZjBbA0Wv2AQrsNtafTGGRlLtfJppHSP3Izye3KRIYl0wZ6uX8cjkpAOvpBe--TiWb1kgvTHmdXHO4MdTiIN9v9KOmaAT9nduQHXnfbguGuVIxcs7lKaW8L3PJlwLtBN0oAJGiaO9CXHgDKUqLHoFeNlIJoChaMEybhKULgvFYtp4nF3EyqZImkLiHJzPhXIA5Z4VLgeS6RUuV2CaaqTIlmFwyn748cCWSM2GYkdxkZYsRZWXAqrCuuanx8rS1ZWoTaVWFwevue4qTVBWWKXbf_91T7MdwdRL-6d9s93YEHvU1So1KCSP01wV8UZebpn1Osd8ILNLQ
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Direction+of+Arrival+With+One+Microphone%2C+a+Few+LEGOs%2C+and+Non-Negative+Matrix+Factorization&rft.jtitle=IEEE%2FACM+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=El+Badawy%2C+Dalia&rft.au=Dokmanic%2C+Ivan&rft.date=2018-12-01&rft.issn=2329-9290&rft.eissn=2329-9304&rft.volume=26&rft.issue=12&rft.spage=2436&rft.epage=2446&rft_id=info:doi/10.1109%2FTASLP.2018.2867081&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TASLP_2018_2867081
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2329-9290&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2329-9290&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2329-9290&client=summon