Direction of Arrival With One Microphone, a Few LEGOs, and Non-Negative Matrix Factorization

Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the vari...

Full description

Saved in:

Bibliographic Details
Published in	IEEE/ACM transactions on audio, speech, and language processing Vol. 26; no. 12; pp. 2436 - 2446
Main Authors	El Badawy, Dalia, Dokmanic, Ivan
Format	Journal Article
Language	English
Published	Piscataway IEEE 01.12.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Acoustics Algorithms Direction of arrival Direction-of-arrival estimation Factorization Frequency response group sparsity Human performance Ill posed problems Inverse problems Microphones monaural localization non-negative matrix factorization Scattering Sound localization sound scattering Speech processing universal speech model White noise
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Conventional approaches to sound source localization require at least two microphones. It is known, however, that people with unilateral hearing loss can also localize sounds. Monaural localization is possible thanks to the scattering by the head, though it hinges on learning the spectra of the various sources. We take inspiration from this human ability to propose algorithms for accurate sound source localization using a single microphone embedded in an arbitrary scattering structure. The structure modifies the frequency response of the microphone in a direction-dependent way giving each direction a signature. While knowing those signatures is sufficient to localize sources of white noise, localizing speech is much more challenging: it is an ill-posed inverse problem, which we regularize by prior knowledge in the form of learned non-negative dictionaries. We demonstrate a monaural speech localization algorithm based on non-negative matrix factorization that does not depend on sophisticated, designed scatterers. In fact, we show experimental results with ad hoc scatterers made of LEGO bricks. Even with these rudimentary structures we can accurately localize arbitrary speakers; that is, we do not need to learn the dictionary for the particular speaker to be localized. Finally, we discuss multi-source localization and the related limitations of our approach.
ISSN:	2329-9290 2329-9304
DOI:	10.1109/TASLP.2018.2867081