Syntactic reanalysis in language models for speech recognition

State-of-the-art speech recognition systems steadily increase their performance using different variants of deep neural networks and postprocess the results by employing N-gram statistical models trained on a large amount of data coming from the general-purpose domain. While achieving an excellent p...

Full description

Saved in:

Bibliographic Details
Published in	2017 Joint IEEE International Conference on Development and Learning and Epigenetic Robotics (ICDL-EpiRob) pp. 215 - 220
Main Authors	Twiefel, Johannes, Hinaut, Xavier, Wermter, Stefan
Format	Conference Proceeding
Language	English
Published	IEEE 01.09.2017
Subjects	Acoustics Data models Google Grammar Hidden Markov models Speech Speech recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	State-of-the-art speech recognition systems steadily increase their performance using different variants of deep neural networks and postprocess the results by employing N-gram statistical models trained on a large amount of data coming from the general-purpose domain. While achieving an excellent performance regarding Word Error Rate (17.343% on our HumanRobot Interaction data set), state-of-the-art systems generate hypotheses that are grammatically incorrect in 57.316% of the cases. Moreover, if employed in a restricted domain (e.g. HumanRobot Interaction), around 50% of the hypotheses contain out-of-domain words. The latter are confused with similarly pronounced in-domain words and cannot be interpreted by a domain-specific inference system. The state-of-the-art speech recognition systems lack a mechanism that addresses the syntactic correctness of hypotheses. We propose a system that can detect and repair grammatically incorrect or infrequent sentence forms. It is inspired by a computational neuroscience model that we developed previously. The current system is still a proof-of-concept version of a future neurobiologically more plausible neural network model. Hence, the resulting system postprocesses sentence hypotheses of state-of-the-art speech recognition systems, producing in-domain words in 100% of the cases, syntactically and grammatically correct hypotheses in 90.319% of the cases. Moreover, it reduces the Word Error Rate to 11.038%.
ISSN:	2161-9484
DOI:	10.1109/DEVLRN.2017.8329810