A deep neural network approach to speech bandwidth expansion

We propose a deep neural network (DNN) approach to speech bandwidth expansion (BWE) by estimating the spectral mapping function from narrowband (4 kHz in bandwidth) to wideband (8 kHz in bandwidth). Log-spectrum power is used as the input and output features to perform the required nonlinear transfo...

Full description

Saved in:
Bibliographic Details
Published in2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 4395 - 4399
Main Authors Kehuang Li, Chin-Hui Lee
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.04.2015
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We propose a deep neural network (DNN) approach to speech bandwidth expansion (BWE) by estimating the spectral mapping function from narrowband (4 kHz in bandwidth) to wideband (8 kHz in bandwidth). Log-spectrum power is used as the input and output features to perform the required nonlinear transformation, and DNNs are trained to realize this high-dimensional mapping function. When evaluating the proposed approach on a large-scale 10-hour test set, we found that the DNN-expanded speech signals give excellent objective quality measures in terms of segmental signal-to-noise ratio and log-spectral distortion when compared with conventional BWE based on Gaussian mixture models (GMMs). Subjective listening tests also give a 69% preference score for DNN-expanded speech over 31% for GMM when the phase information is assumed known. For tests in real operation when the phase information is imaged from the given narrowband signal the preference comparison goes up to 84% versus 16%. A correct phase recovery can further increase the BWE performance for the proposed DNN method.
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.2015.7178801