Indoor sound source localization under reverberation by extracting the features of sample covariance

•A three-dimensional sound source localization method based on deep learning under a highly reverberant indoor environment.•DNN trained with time-domain features obtained from the sample covariance of complex-value analytic signals at microphone array.•Enhanced indoor speech source localization perf...

Full description

Saved in:
Bibliographic Details
Published inApplied acoustics Vol. 210; p. 109453
Main Authors Yan, Jiajun, Zhao, Wenlai, Wu, Yue Ivan, Zhou, Yingjie
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.07.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:•A three-dimensional sound source localization method based on deep learning under a highly reverberant indoor environment.•DNN trained with time-domain features obtained from the sample covariance of complex-value analytic signals at microphone array.•Enhanced indoor speech source localization performance, efficient with different reverberation levels, room dimensions, speech signals, microphone array offsets, and number of snapshots. Three dimensional indoor sound source localization is well known as a challenging task due to the complicated mechanics of reverberation. The conventional model-based methods generally require the prior knowledge of the microphone array geometry, the environment’s parameters, and the statistics of the signal and noise. On the other hand, the data-based methods could work in the absence of all/part of the aforementioned prior knowledge, while the localization performance depends on the established features of the data and the nature of neural network. In this work, a feature vector constructed from the sample covariance matrix of the microphone array data is fed to a designed deep neural network. Numerical simulation results show that the proposed method outperforms [21] in both ranging and direction finding, hence locating a human-speech source in a three-dimensional reverberant room space. An average relative localization error below 3% can be reached, which is robust to the room reverberation level of T60 from 0 to 600 ms when the signal-to-noise-ratio is not below 0 dB. Moreover, the robustness of the proposed method against different scenarios is verified: the increment of relative localization error is below 5% for various room dimensions, below 10% for different speech signals, and below 3% for microphone array offsets.
ISSN:0003-682X
1872-910X
DOI:10.1016/j.apacoust.2023.109453