Per-Channel Energy Normalization: Why and How

In the context of automatic speech recognition and acoustic event detection, an adaptive procedure named per-channel energy normalization (PCEN) has recently shown to outperform the pointwise logarithm of mel-frequency spectrogram (logmelspec) as an acoustic frontend. This letter investigates the ad...

Full description

Saved in:
Bibliographic Details
Published inIEEE signal processing letters Vol. 26; no. 1; pp. 39 - 43
Main Authors Lostanlen, Vincent, Salamon, Justin, Cartwright, Mark, McFee, Brian, Farnsworth, Andrew, Kelling, Steve, Bello, Juan Pablo
Format Journal Article
LanguageEnglish
Published New York IEEE 01.01.2019
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Institute of Electrical and Electronics Engineers
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:In the context of automatic speech recognition and acoustic event detection, an adaptive procedure named per-channel energy normalization (PCEN) has recently shown to outperform the pointwise logarithm of mel-frequency spectrogram (logmelspec) as an acoustic frontend. This letter investigates the adequacy of PCEN for spectrogram-based pattern recognition in far-field noisy recordings, both from theoretical and practical standpoints. First, we apply PCEN on various datasets of natural acoustic environments and find empirically that it Gaussianizes distributions of magnitudes while decorrelating frequency bands. Second, we describe the asymptotic regimes of each component in PCEN: temporal integration, gain control, and dynamic range compression. Third, we give practical advice for adapting PCEN parameters to the temporal properties of the noise to be mitigated, the signal to be enhanced, and the choice of time-frequency representation. As it converts a large class of real-world soundscapes into additive white Gaussian noise, PCEN is a computationally efficient frontend for robust detection and classification of acoustic events in heterogeneous environments.
ISSN:1070-9908
1558-2361
DOI:10.1109/LSP.2018.2878620