Deep Belief Networks using discriminative features for phone recognition

Deep Belief Networks (DBNs) are multi-layer generative models. They can be trained to model windows of coefficients extracted from speech and they discover multiple layers of features that capture the higher-order statistical structure of the data. These features can be used to initialize the hidden...

Full description

Saved in:

Bibliographic Details
Published in	2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 5060 - 5063
Main Authors	Mohamed, Abdel-rahman, Sainath, Tara N., Dahl, George, Ramabhadran, Bhuvana, Hinton, Geoffrey E., Picheny, Michael A.
Format	Conference Proceeding
Language	English
Published	IEEE 01.05.2011
Subjects	Artificial neural networks Decoding Deep belief networks Discriminative feature transformation Error analysis Hidden Markov models Mel frequency cepstral coefficient Phone recognition Speech Training
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Deep Belief Networks (DBNs) are multi-layer generative models. They can be trained to model windows of coefficients extracted from speech and they discover multiple layers of features that capture the higher-order statistical structure of the data. These features can be used to initialize the hidden units of a feed-forward neural network that is then trained to predict the HMM state for the central frame of the window. Initializing with features that are good at generating speech makes the neural network perform much better than initializing with random weights. DBNs have already been used successfully for phone recognition with input coefficients that are MFCCs or filterbank outputs [1, 2]. In this paper, we demonstrate that they work even better when their inputs are speaker adaptive, discriminative features. On the standard TIMIT corpus, they give phone error rates of 19.6% using monophone HMMs and a bigram language model and 19.4% using monophone HMMs and a trigram language model.
ISBN:	9781457705380 1457705389
ISSN:	1520-6149
DOI:	10.1109/ICASSP.2011.5947494