A simple statistical speech recognition of mandarin monosyllables

Each mandarin syllable is represented by a sequence of vectors of linear predict coding cepstra (LPCC). Since all syllables have a simple phonetic structure, in our speech recognition, we partition the sequence of LPCC vectors of all syllables into equal segments and average the LPCC vectors in each...

Full description

Saved in:

Bibliographic Details
Published in	Applied mathematics and computation Vol. 177; no. 2; pp. 644 - 651
Main Authors	Li, Tze Fen, Chang, Shui-Ching, Lee, Chung-Bow
Format	Journal Article
Language	English
Published	New York, NY Elsevier Inc 15.06.2006 Elsevier
Subjects	Applied sciences Artificial intelligence Bayes decision rule Computer science; control theory; systems Decision theory Exact sciences and technology Linear predict coding Mathematics Probability and statistics Sciences and techniques of general use Speech and sound recognition and synthesis. Linguistics Speech recognition Statistics Bayes decision rule Linear predict coding Speech recognition Exponential distribution Gaussian distribution Probability distribution Decision rule Numerical analysis Applied mathematics Distribution function Feature extraction Bayes decision Absolute value
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Each mandarin syllable is represented by a sequence of vectors of linear predict coding cepstra (LPCC). Since all syllables have a simple phonetic structure, in our speech recognition, we partition the sequence of LPCC vectors of all syllables into equal segments and average the LPCC vectors in each segment. The mean vector of LPCC is used as the feature of a syllable. Our simple feature does not need any time consuming and complicated nonlinear contraction and expansion as adopted by the dynamic time-warping. We propose several probability distributions for the feature values. A simplified Bayes decision rule is used for classification of mandarin syllables. For the speaker-independent mandarin digits, the recognition rate is 98.6% if a normal distribution is used for feature values and the rate is 98.1% if an exponential distribution is used for the absolute values of the features. The feature proposed in this paper to represent a syllable is the simplest one, much easier to be extracted than any other known features. The computation for feature extraction and classification is much faster and more accurate than using the HMM method or any other known techniques.
ISSN:	0096-3003 1873-5649
DOI:	10.1016/j.amc.2005.09.094