An automatic method for learning a Japanese lexicon for recognition of spontaneous speech

When developing a speech recognition system, one must start by deciding what the units to be recognized should be. This is for the most part a straightforward choice in the case of word-based languages such as English, but becomes an issue even in handling languages with a complex compounding system...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181) Vol. 1; pp. 305 - 308 vol.1
Main Authors Mayfield Tomokiyo, L., Ries, K.
Format Conference Proceeding
LanguageEnglish
Published IEEE 1998
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:When developing a speech recognition system, one must start by deciding what the units to be recognized should be. This is for the most part a straightforward choice in the case of word-based languages such as English, but becomes an issue even in handling languages with a complex compounding system like German; with an agglutinative language like Japanese, which provides no spaces in written text, the choice is not at all obvious. Once an appropriate unit has been determined, the problem of consistently segmenting transcriptions of training data must be addressed. This paper describes a method for learning a lexicon from a training corpus which contains no word-level segmentation, applied to the problem of building a Japanese speech recognition system. We show not only that one can satisfactorily segment transcribed training data automatically, avoiding human error, but also that our system, when trained with the automatically segmented corpus, showed a significant improvement in recognition performance.
ISBN:9780780344280
0780344286
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.1998.674428