Direction of Arrival With One Microphone, a Few LEGOs, and Non-Negative Matrix Factorization

Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the vari...

Full description

Saved in:
Bibliographic Details
Published inIEEE/ACM transactions on audio, speech, and language processing Vol. 26; no. 12; pp. 2436 - 2446
Main Authors El Badawy, Dalia, Dokmanic, Ivan
Format Journal Article
LanguageEnglish
Published Piscataway IEEE 01.12.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text

Cover

Loading…
Abstract Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the various sources. We take inspiration from this human ability to propose algorithms for accurate sound source localization using a single microphone embedded in an arbitrary scattering structure. The structure modifies the frequency response of the microphone in a direction-dependent way giving each direction a signature. While knowing those signatures is sufficient to localize sources of white noise, localizing speech is much more challenging: it is an ill-posed inverse problem, which we regularize by prior knowledge in the form of learned non-negative dictionaries. We demonstrate a monaural speech localization algorithm based on non-negative matrix factorization that does not depend on sophisticated, designed scatterers. In fact, we show experimental results with ad hoc scatterers made of LEGO bricks. Even with these rudimentary structures we can accurately localize arbitrary speakers; that is, we do not need to learn the dictionary for the particular speaker to be localized. Finally, we discuss multi-source localization and the related limitations of our approach.
AbstractList Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the various sources. We take inspiration from this human ability to propose algorithms for accurate sound source localization using a single microphone embedded in an arbitrary scattering structure. The structure modifies the frequency response of the microphone in a direction-dependent way giving each direction a signature. While knowing those signatures is sufficient to localize sources of white noise, localizing speech is much more challenging: it is an ill-posed inverse problem, which we regularize by prior knowledge in the form of learned non-negative dictionaries. We demonstrate a monaural speech localization algorithm based on non-negative matrix factorization that does not depend on sophisticated, designed scatterers. In fact, we show experimental results with ad hoc scatterers made of LEGO bricks. Even with these rudimentary structures we can accurately localize arbitrary speakers; that is, we do not need to learn the dictionary for the particular speaker to be localized. Finally, we discuss multi-source localization and the related limitations of our approach.
Author Dokmanic, Ivan
El Badawy, Dalia
Author_xml – sequence: 1
  givenname: Dalia
  surname: El Badawy
  fullname: El Badawy, Dalia
  email: dalia.elbadawy@epfl.ch
  organization: Ecole Polytech. Fed. de Lausanne, Lausanne, Switzerland
– sequence: 2
  givenname: Ivan
  surname: Dokmanic
  fullname: Dokmanic, Ivan
  email: dokmanic@illinois.edu
  organization: Dept. of Electr. & Comput. Eng., Univ. of Illinois at Urbana-Champaign, Urbana, IL, USA
BookMark eNo9kM1OAjEUhRuDiYi8gG6auHWw7XQ67ZIgoMkIJmLcmDSl00oJTrEz4M_TWwRd3XuTc87N-U5Bq_KVAeAcox7GSFzP-o_FQ48gzHuEsxxxfATaJCUiESmirb-dCHQCunW9RAhhlAuR0zZ4uXHB6Mb5CnoL-yG4rVrBZ9cs4LQy8N7p4NeL-O8KKjgyH7AYjqd1PKoSTnyVTMyratw2KlUT3CccKd344L7VLvIMHFu1qk33MDvgaTScDW6TYjq-G_SLRKeUNknJsswqrUuTqnnOSyoUtsxqk1NurKGMUcRzm5ekNCWezxHXmBPNbCxPVabTDrjc566Df9-YupFLvwlVfCnJrinBIuVRRfaqWKmug7FyHdybCl8SI7kDKX9Byh1IeQAZTRd7kzPG_Bs4pRnLWPoDrhdwhQ
CODEN ITASD8
CitedBy_id crossref_primary_10_1016_j_jsv_2023_117671
crossref_primary_10_1002_advs_201902271
crossref_primary_10_3390_s23020769
crossref_primary_10_3389_frsip_2024_1341087
crossref_primary_10_3389_fphy_2022_1024964
Cites_doi 10.1121/1.1903351
10.1016/0378-5955(92)90123-5
10.1121/1.390770
10.1002/cpa.20132
10.1109/TASL.2006.885253
10.1109/TASL.2010.2050089
10.1007/978-1-4614-4942-3
10.1109/TASL.2006.876726
10.1162/neco.2008.04-08-771
10.1109/TASL.2013.2270369
10.1121/1.1811412
10.1159/000380745
10.1109/TSP.2005.850882
10.1073/pnas.1502276112
10.1137/S0036144500367337
10.1117/12.893870
10.1068/p150067
10.1023/A:1008350127376
10.1162/NECO_a_00168
10.1109/TASL.2012.2183869
10.1109/89.841214
10.1038/44565
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2018
DBID 97E
RIA
RIE
AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
DOI 10.1109/TASLP.2018.2867081
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005-present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE/IET Electronic Library (IEL)
CrossRef
Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DatabaseTitle CrossRef
Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Computer and Information Systems Abstracts
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library Online
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2329-9304
EndPage 2446
ExternalDocumentID 10_1109_TASLP_2018_2867081
8445656
Genre orig-research
GrantInformation_xml – fundername: Swiss National Science Foundation
  grantid: 20FP-1 151073
GroupedDBID 0R~
4.4
6IK
97E
AAJGR
AAKMM
AALFJ
AASAJ
AAWTV
ABQJQ
ABVLG
ACIWK
ACM
ADBCU
ADPZR
AEBYY
AENSD
AFWIH
AFWXC
AIKLT
AKJIK
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CCLIF
EBS
EJD
GUFHI
HGAVV
IFIPE
IPLJI
JAVBF
LHSKQ
M43
OCL
PQQKQ
RIA
RIE
RNS
ROL
AAYXX
CITATION
7SC
8FD
JQ2
L7M
L~C
L~D
ID FETCH-LOGICAL-c344t-d655faccde3ab78d49a1f6fce748efe4664087f7d2ded1bb08c182c6f1094a5c3
IEDL.DBID RIE
ISSN 2329-9290
IngestDate Thu Oct 10 17:52:54 EDT 2024
Fri Aug 23 00:55:35 EDT 2024
Wed Jun 26 19:28:15 EDT 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 12
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c344t-d655faccde3ab78d49a1f6fce748efe4664087f7d2ded1bb08c182c6f1094a5c3
ORCID 0000-0002-0931-1601
0000-0001-7132-5214
PQID 2107921938
PQPubID 85426
PageCount 11
ParticipantIDs proquest_journals_2107921938
crossref_primary_10_1109_TASLP_2018_2867081
ieee_primary_8445656
PublicationCentury 2000
PublicationDate 2018-12-01
PublicationDateYYYYMMDD 2018-12-01
PublicationDate_xml – month: 12
  year: 2018
  text: 2018-12-01
  day: 01
PublicationDecade 2010
PublicationPlace Piscataway
PublicationPlace_xml – name: Piscataway
PublicationTitle IEEE/ACM transactions on audio, speech, and language processing
PublicationTitleAbbrev TASLP
PublicationYear 2018
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References kitamura (ref42) 0
colton (ref37) 2013
ref34
ref12
ref31
roux (ref19) 2015
ref30
ref32
garofolo (ref40) 1993
ref2
smaragdis (ref24) 0
ref38
ref16
lee (ref15) 1999; 401
sun (ref17) 0
saxena (ref8) 0
wierstorf (ref39) 2011
ref23
ref26
ref20
ref41
ref21
cichocki (ref36) 0; 5
schmidt (ref18) 0
friedman (ref35) 2010
dokmani? (ref11) 0
dikmen (ref22) 0
algazi (ref28) 0
ref27
badawy (ref9) 0
ledoux (ref29) 2001
lefèvre (ref33) 0
ref7
blauert (ref1) 1997
ref4
ref3
ref6
ref5
dokmani? (ref10) 2015
boufounos (ref13) 0; 8138
traa (ref25) 0
langville (ref43) 2014
cagli (ref14) 0
References_xml – ident: ref30
  doi: 10.1121/1.1903351
– ident: ref3
  doi: 10.1016/0378-5955(92)90123-5
– ident: ref2
  doi: 10.1121/1.390770
– ident: ref34
  doi: 10.1002/cpa.20132
– start-page: 177
  year: 0
  ident: ref24
  article-title: Non-negative matrix factorization for polyphonic music transcription
  publication-title: Proc IEEE Workshop Appl Signal Process Audio Acoust
  contributor:
    fullname: smaragdis
– ident: ref20
  doi: 10.1109/TASL.2006.885253
– start-page: 489
  year: 0
  ident: ref9
  article-title: Acoustic DoA estimation by one unsophisticated sensor
  publication-title: Proc Int Conf Latent Variable Anal Signal Separat
  contributor:
    fullname: badawy
– ident: ref26
  doi: 10.1109/TASL.2010.2050089
– year: 2013
  ident: ref37
  publication-title: Inverse Acoustic and Electromagnetic Scattering Theory
  doi: 10.1007/978-1-4614-4942-3
  contributor:
    fullname: colton
– year: 2001
  ident: ref29
  publication-title: The Concentration of Measure Phenomenon
  contributor:
    fullname: ledoux
– ident: ref21
  doi: 10.1109/TASL.2006.876726
– ident: ref16
  doi: 10.1162/neco.2008.04-08-771
– ident: ref23
  doi: 10.1109/TASL.2013.2270369
– year: 2010
  ident: ref35
  article-title: A note on the group lasso and a sparse group lasso
  contributor:
    fullname: friedman
– ident: ref4
  doi: 10.1121/1.1811412
– start-page: 141
  year: 0
  ident: ref17
  article-title: Universal speech models for speaker independent single channel source separation
  publication-title: Proc IEEE Int Conf Audio Speech Signal Process
  contributor:
    fullname: sun
– start-page: 2614
  year: 0
  ident: ref18
  article-title: Single-channel speech separation using sparse non-negative matrix factorization
  publication-title: Proc INTERSPEECH
  contributor:
    fullname: schmidt
– ident: ref31
  doi: 10.1159/000380745
– ident: ref12
  doi: 10.1109/TSP.2005.850882
– ident: ref7
  doi: 10.1073/pnas.1502276112
– ident: ref38
  doi: 10.1137/S0036144500367337
– volume: 8138
  start-page: 81380k
  year: 0
  ident: ref13
  article-title: Joint sparsity models for wideband array processing
  publication-title: Proc SPIE
  doi: 10.1117/12.893870
  contributor:
    fullname: boufounos
– ident: ref5
  doi: 10.1068/p150067
– start-page: 1
  year: 0
  ident: ref25
  article-title: Directional NMF for joint source localization and separation
  publication-title: Proc IEEE Workshop Appl Signal Process Audio Acoust
  contributor:
    fullname: traa
– ident: ref6
  doi: 10.1023/A:1008350127376
– start-page: 93
  year: 0
  ident: ref22
  article-title: Unsupervised single-channel source separation using bayesian NMF
  publication-title: Proc IEEE Workshop Appl Signal Process Audio Acoust
  contributor:
    fullname: dikmen
– ident: ref32
  doi: 10.1162/NECO_a_00168
– start-page: 1737
  year: 0
  ident: ref8
  article-title: Learning sound location from a single microphone
  publication-title: Proc IEEE Int Conf Robot Autom
  contributor:
    fullname: saxena
– year: 1997
  ident: ref1
  publication-title: Spatial Hearing The Psychophysics of Human Sound Localization
  contributor:
    fullname: blauert
– year: 1993
  ident: ref40
  article-title: DARPA TIMIT: Acoustic-phonetic continuous speech corpus
  contributor:
    fullname: garofolo
– ident: ref41
  doi: 10.1109/TASL.2012.2183869
– year: 2011
  ident: ref39
  article-title: A free database of head-related impulse response measurements in the horizontal plane with multiple distances
  publication-title: Audio Engineering Society Convention 130
  contributor:
    fullname: wierstorf
– year: 2014
  ident: ref43
  article-title: Algorithms, initializations, and convergence for the nonnegative matrix factorization
  contributor:
    fullname: langville
– year: 2015
  ident: ref10
  article-title: Listening to distances and hearing shapes: Inverse problems in room acoustics and beyond
  contributor:
    fullname: dokmani?
– ident: ref27
  doi: 10.1109/89.841214
– start-page: 99
  year: 0
  ident: ref28
  article-title: The CIPIC HRTF database
  publication-title: Proc IEEE Workshop Appl Signal Process Audio Acoust
  contributor:
    fullname: algazi
– start-page: 1
  year: 0
  ident: ref42
  article-title: Efficient initialization for nonnegative matrix factorization based on nonnegative independent component analysis
  publication-title: IEEE Int Workshop Acoust Signal Enhancement
  contributor:
    fullname: kitamura
– start-page: 21
  year: 0
  ident: ref33
  article-title: Itakura-Saito non-negative matrix factorization with group sparsity
  publication-title: Proc IEEE Int Conf Audio Speech Signal Process
  contributor:
    fullname: lefèvre
– start-page: 2617
  year: 0
  ident: ref11
  article-title: Room Helps: Acoustic localization with finite elements
  publication-title: Proc IEEE Int Conf Audio Speech Signal Process
  contributor:
    fullname: dokmani?
– start-page: 1
  year: 0
  ident: ref14
  article-title: Robust DOA estimation of speech signals via sparsity models using microphone arrays
  publication-title: Proc IEEE Workshop Appl Signal Process Audio Acoust
  contributor:
    fullname: cagli
– volume: 401
  start-page: 788
  year: 1999
  ident: ref15
  article-title: Learning the parts of objects by non-negative matrix factorization
  publication-title: Nature
  doi: 10.1038/44565
  contributor:
    fullname: lee
– volume: 5
  start-page: 621v
  year: 0
  ident: ref36
  article-title: New algorithms for non-negative matrix factorization in applications to blind source separation
  publication-title: Proc IEEE Int Conf Audio Speech Signal Process
  contributor:
    fullname: cichocki
– year: 2015
  ident: ref19
  article-title: Sparse NMF - Half-baked or Well Done?
  contributor:
    fullname: roux
SSID ssj0001079974
Score 2.2054214
Snippet Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also...
SourceID proquest
crossref
ieee
SourceType Aggregation Database
Publisher
StartPage 2436
SubjectTerms Acoustics
Algorithms
Direction of arrival
Direction-of-arrival estimation
Factorization
Frequency response
group sparsity
Human performance
Ill posed problems
Inverse problems
Microphones
monaural localization
non-negative matrix factorization
Scattering
Sound localization
sound scattering
Speech processing
universal speech model
White noise
Title Direction of Arrival With One Microphone, a Few LEGOs, and Non-Negative Matrix Factorization
URI https://ieeexplore.ieee.org/document/8445656
https://www.proquest.com/docview/2107921938
Volume 26
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8JAEJ4AJz34QiOKZg_epFDott0eiQGJ4WEiRA4mTbs7q8SkGCjR-Ovd3RZ8Hry1zbbdzMzOY3fmG4ALybWho2jRwJYW9exALynXQpcrA9uUcWy2BgZDrzehN1N3WoDaphYGEU3yGdb1pTnLF3O-0ltlDUaN_1GEoh8EWa3W536KrZ4Z0GXlI6gftgJ7XSNjB41x-65_qxO5WL3FPN9mzW92yDRW-aWNjYnp7sJgPbkss-S5vkrjOn__gdv439nvwU7ua5J2Jhz7UMDkALa_IBCW4SFXefOEzKUauZgpySP3s_SJjBIkA52up7PXsUYi0sVX0u9cj5bqJhFkOE-sIT4a5HAy0FD_b6Rr2vfktZ2HMOl2xlc9K2-4YHGH0tQSnuvKiHOBThT7TNAgakpPcvQpQ4kaid5mvvRFS6BoKjYyrsIT7klFYhq53DmCUqKmdAxEGTmPCuQ-i1wqHJdFUqrILsI41iGRrMDlmvzhS4arEZp4xA5Cw6xQMyvMmVWBsqbnZmROygpU1xwL86W3DFtaEpQedtjJ32-dwpb-dpaTUoVSuljhmfIs0vjciNQHNTrJ4Q
link.rule.ids 315,783,787,799,27937,27938,55087
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT8MwDLZgHIADb8RgQA7coFu7pm16nNDGgHUgMcQOSFWbODAhdQg6gfj1JGnH-8CtlVLFsh0_UvszwIHk2tFRtGhoS4v6dqiPlGehx5WDdWSamquBqO93r-nZ0BvOwNFHLwwimuIzrOtH8y9fjPlEX5U1GDXxxyzMqbiaBUW31ueNih2EoYFdVlGC2rIZ2tMuGTtsDFpXvUtdysXqTeYHNnO-eSIzWuWXPTZOprMM0ZS8orbkoT7J0zp_-4Hc-F_6V2CpjDZJq1CPVZjBbA0Wv2AQrsNtafTGGRlLtfJppHSP3Izye3KRIYl0wZ6uX8cjkpAOvpBe--TiWb1kgvTHmdXHO4MdTiIN9v9KOmaAT9nduQHXnfbguGuVIxcs7lKaW8L3PJlwLtBN0oAJGiaO9CXHgDKUqLHoFeNlIJoChaMEybhKULgvFYtp4nF3EyqZImkLiHJzPhXIA5Z4VLgeS6RUuV2CaaqTIlmFwyn748cCWSM2GYkdxkZYsRZWXAqrCuuanx8rS1ZWoTaVWFwevue4qTVBWWKXbf_91T7MdwdRL-6d9s93YEHvU1So1KCSP01wV8UZebpn1Osd8ILNLQ
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Direction+of+Arrival+With+One+Microphone%2C+a+Few+LEGOs%2C+and+Non-Negative+Matrix+Factorization&rft.jtitle=IEEE%2FACM+transactions+on+audio%2C+speech%2C+and+language+processing&rft.au=El+Badawy%2C+Dalia&rft.au=Dokmanic%2C+Ivan&rft.date=2018-12-01&rft.issn=2329-9290&rft.eissn=2329-9304&rft.volume=26&rft.issue=12&rft.spage=2436&rft.epage=2446&rft_id=info:doi/10.1109%2FTASLP.2018.2867081&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TASLP_2018_2867081
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2329-9290&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2329-9290&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2329-9290&client=summon