Statistical Transformation of Language and Pronunciation Models for Spontaneous Speech Recognition
We propose a novel approach based on a statistical transformation framework for language and pronunciation modeling of spontaneous speech. Since it is not practical to train a spoken-style model using numerous spoken transcripts, the proposed approach generates a spoken-style model by transforming a...
Saved in:
Published in | IEEE transactions on audio, speech, and language processing Vol. 18; no. 6; pp. 1539 - 1549 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Piscataway, NJ
IEEE
01.08.2010
Institute of Electrical and Electronics Engineers |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | We propose a novel approach based on a statistical transformation framework for language and pronunciation modeling of spontaneous speech. Since it is not practical to train a spoken-style model using numerous spoken transcripts, the proposed approach generates a spoken-style model by transforming an orthographic model trained with document archives such as the minutes of meetings and the proceedings of lectures. The transformation is based on a statistical model estimated using a small amount of a parallel corpus, which consists of faithful transcripts aligned with their orthographic documents. Patterns of transformation, such as substitution, deletion, and insertion of words, are extracted with their word and part-of-speech (POS) contexts, and transformation probabilities are estimated based on occurrence statistics in a parallel aligned corpus. For pronunciation modeling, subword-based mapping between baseforms and surface forms is extracted with their occurrence counts, then a set of rewrite rules with their probabilities are derived as a transformation model. Spoken-style language and pronunciation (surface forms) models can be predicted by applying these transformation patterns to a document-style language model and baseforms in a lexicon, respectively. The transformed models significantly reduced perplexity and word error rates (WERs) in a task of transcribing congressional meetings, even though the domains and topics were different from the parallel corpus. This result demonstrates the generality and portability of the proposed framework. |
---|---|
ISSN: | 1558-7916 1558-7924 |
DOI: | 10.1109/TASL.2009.2037400 |