BeamLearning: An end-to-end deep learning approach for the angular localization of sound sources using raw multichannel acoustic pressure data

Sound source localization using multichannel signal processing has been a subject of active research for decades. In recent years, the use of deep learning in audio signal processing has significantly improved the performances for machine hearing. This has motivated the scientific community to also...

Full description

Saved in:

Bibliographic Details
Published in	The Journal of the Acoustical Society of America Vol. 149; no. 6; pp. 4248 - 4263
Main Authors	Pujol, Hadrien, Bavu, Éric, Garcia, Alexandre
Format	Journal Article
Language	English
Published	Acoustical Society of America 01.06.2021
Series	Special issue on Machine Learning in Acoustics
Subjects	Acoustics Engineering Sciences Machine Learning Signal and Image processing Statistics
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Sound source localization using multichannel signal processing has been a subject of active research for decades. In recent years, the use of deep learning in audio signal processing has significantly improved the performances for machine hearing. This has motivated the scientific community to also develop machine learning strategies for source localization applications. This paper presents BeamLearning, a multiresolution deep learning approach that allows the encoding of relevant information contained in unprocessed time-domain acoustic signals captured by microphone arrays. The use of raw data aims at avoiding the simplifying hypothesis that most traditional model-based localization methods rely on. Benefits of its use are shown for real-time sound source two-dimensional localization tasks in reverberating and noisy environments. Since supervised machine learning approaches require large-sized, physically realistic, precisely labelled datasets, a fast graphics processing unit-based computation of room impulse responses was developed using fractional delays for image source models. A thorough analysis of the network representation and extensive performance tests are carried out using the BeamLearning network with synthetic and experimental datasets. Obtained results demonstrate that the BeamLearning approach significantly outperforms the wideband MUSIC and steered response power-phase transform methods in terms of localization accuracy and computational efficiency in the presence of heavy measurement noise and reverberation.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0001-4966 1520-8524
DOI:	10.1121/10.0005046