Unsupervised and semi-supervised adaptation of a hybrid speech recognition system

This paper evaluates a recently published method for supervised and unsupervised adaptation of neural networks used in hybrid speech recognition systems. The neural networks used in the field of hybrid speech recognition have certain distinct characteristics that make the usual adaptation methods (s...

Full description

Saved in:

Bibliographic Details
Published in	2012 IEEE 11th International Conference on Signal Processing Vol. 1; pp. 527 - 530
Main Authors	Trmal, J., Zelinka, J., Muller, L.
Format	Conference Proceeding
Language	English
Published	IEEE 01.10.2012
Subjects	MELT Neural Networks Speaker Adaptation Speech Recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This paper evaluates a recently published method for supervised and unsupervised adaptation of neural networks used in hybrid speech recognition systems. The neural networks used in the field of hybrid speech recognition have certain distinct characteristics that make the usual adaptation methods (such as retraining the neural network) unusable or ineffective. The recently published MELT (Minimum Error Linear Transform) method [1] has been developed to cope with this issue. By providing a way of establishing a link between the intermediate features and the long temporal features, the number of free variables can be reduced significantly and the resulting adaptation parameters can be estimated robustly. The experiments were performed on the WSJCAM0 speech corpus. Contrary to the original paper [1], the experiments were performed using a word recognizer instead of a phoneme recognizer. The experimental results suggest that the MELT method can be used both in an unsupervised as well as a semi-supervised manner and when applied, it leads to significant reduction of word error rate, even for a strong language model.
ISBN:	9781467321969 1467321966
ISSN:	2164-5221
DOI:	10.1109/ICoSP.2012.6491542