Language model verbalization for automatic speech recognition

Transcribing speech in properly formatted written language presents some challenges for automatic speech recognition systems. The difficulty arises from the conversion ambiguity between verbal and written language in both directions. Non-lexical vocabulary items such as numeric entities, dates, time...

Full description

Saved in:
Bibliographic Details
Published in2013 IEEE International Conference on Acoustics, Speech and Signal Processing pp. 8262 - 8266
Main Authors Sak, Hasim, Beaufays, Francoise, Nakajima, Kaisuke, Allauzen, Cyril
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.05.2013
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Transcribing speech in properly formatted written language presents some challenges for automatic speech recognition systems. The difficulty arises from the conversion ambiguity between verbal and written language in both directions. Non-lexical vocabulary items such as numeric entities, dates, times, abbreviations and acronyms are particularly ambiguous. This paper describes a finite-state transducer based approach that improves proper transcription of these entities. The approach involves training a language model in the written language domain, and integrating verbal expansions of vocabulary items as a finite-state model into the decoding graph construction. We build an inverted finite-state transducer to map written vocabulary items to alternate verbal expansions using rewrite rules. Then, this verbalizer transducer is composed with the n-gram language model to obtain a verbalized language model, whose input labels are in the verbal language domain while output labels are in the written language domain. We show that the proposed approach is very effective in improving the recognition accuracy of numeric entities.
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.2013.6639276