fMPE: discriminatively trained features for speech recognition

MPE (minimum phone error) is a previously introduced technique for discriminative training of HMM parameters. fMPE applies the same objective function to the features, transforming the data with a kernel-like method and training millions of parameters, comparable to the size of the acoustic model. D...

Full description

Saved in:
Bibliographic Details
Published inProceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005 Vol. 1; pp. I/961 - I/964 Vol. 1
Main Authors Povey, D., Kingsbury, B., Mangu, L., Saon, G., Soltau, H., Zweig, G.
Format Conference Proceeding
LanguageEnglish
Published IEEE 2005
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:MPE (minimum phone error) is a previously introduced technique for discriminative training of HMM parameters. fMPE applies the same objective function to the features, transforming the data with a kernel-like method and training millions of parameters, comparable to the size of the acoustic model. Despite the large number of parameters, fMPE is robust to over-training. The method is to train a matrix projecting from posteriors of Gaussians to a normal size feature space, and then to add the projected features to normal features such as PLP. The matrix is trained from a zero start using a linear method. Sparsity of posteriors ensures speed in both training and test time. The technique gives similar improvements to MPE (around 10% relative). MPE on top of fMPE results in error rates up to 6.5% relative better than MPE alone, or more if multiple layers of transform are trained.
ISBN:9780780388741
0780388747
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.2005.1415275