fMPE: discriminatively trained features for speech recognition

MPE (minimum phone error) is a previously introduced technique for discriminative training of HMM parameters. fMPE applies the same objective function to the features, transforming the data with a kernel-like method and training millions of parameters, comparable to the size of the acoustic model. D...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005 Vol. 1; pp. I/961 - I/964 Vol. 1
Main Authors	Povey, D., Kingsbury, B., Mangu, L., Saon, G., Soltau, H., Zweig, G.
Format	Conference Proceeding
Language	English
Published	IEEE 2005
Subjects	Error analysis Gaussian processes Hidden Markov models Robustness Speech recognition Statistics Testing Training data
Online Access	Get full text

Cover

Loading…

More Information
Summary:	MPE (minimum phone error) is a previously introduced technique for discriminative training of HMM parameters. fMPE applies the same objective function to the features, transforming the data with a kernel-like method and training millions of parameters, comparable to the size of the acoustic model. Despite the large number of parameters, fMPE is robust to over-training. The method is to train a matrix projecting from posteriors of Gaussians to a normal size feature space, and then to add the projected features to normal features such as PLP. The matrix is trained from a zero start using a linear method. Sparsity of posteriors ensures speed in both training and test time. The technique gives similar improvements to MPE (around 10% relative). MPE on top of fMPE results in error rates up to 6.5% relative better than MPE alone, or more if multiple layers of transform are trained.
ISBN:	9780780388741 0780388747
ISSN:	1520-6149 2379-190X
DOI:	10.1109/ICASSP.2005.1415275