Subspace constrained Gaussian mixture models for speech recognition

A standard approach to automatic speech recognition uses hidden Markov models whose state dependent distributions are Gaussian mixture models. Each Gaussian can be viewed as an exponential model whose features are linear and quadratic monomials in the acoustic vector. We consider here models in whic...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on speech and audio processing Vol. 13; no. 6; pp. 1144 - 1160
Main Authors	Axelrod, S., Goel, V., Gopinath, R.A., Olsen, P.A., Visweswariah, K.
Format	Journal Article
Language	English
Published	New York, NY IEEE 01.11.2005 Institute of Electrical and Electronics Engineers
Subjects	Applied sciences Automatic speech recognition Computational complexity Constraints Covariance Covariance matrix covariance modeling Error analysis Exact sciences and technology exponential family Gaussian Hidden Markov models Information, signal and communications theory Linear discriminant analysis Linear transformations Mathematical analysis Mathematical models maximum likelihood estimation of SCGMM Signal processing Speech processing Speech recognition subspace constrained exponential models subspace constrained Gaussian mixture models (SCGMMs) Subspace constraints Subspaces Telecommunications and information theory Vectors Vectors (mathematics) Vocabulary Vocabulary Error rate Gaussian distribution Mixture theory covariance modeling Modeling Linear model Automatic speech recognition Covariance subspace constrained exponential models maximum likelihood estimation of SCGMM Learning algorithm exponential family Discriminant analysis Probabilistic approach Subspace method Grammar Computational complexity Discrimination Gaussian process Hidden Markov models Linear transformation Speech recognition subspace constrained Gaussian mixture models (SCGMMs) Signal processing Feature extraction Maximum likelihood Automatic recognition Speech processing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	A standard approach to automatic speech recognition uses hidden Markov models whose state dependent distributions are Gaussian mixture models. Each Gaussian can be viewed as an exponential model whose features are linear and quadratic monomials in the acoustic vector. We consider here models in which the weight vectors of these exponential models are constrained to lie in an affine subspace shared by all the Gaussians. This class of models includes Gaussian models with linear constraints placed on the precision (inverse covariance) matrices (such as diagonal covariance, maximum likelihood linear transformation, or extended maximum likelihood linear transformation), as well as the LDA/HLDA models used for feature selection which tie the part of the Gaussians in the directions not used for discrimination. In this paper, we present algorithms for training these models using a maximum likelihood criterion. We present experiments on both small vocabulary, resource constrained, grammar-based tasks, as well as large vocabulary, unconstrained resource tasks to explore the rather large parameter space of models that fit within our framework. In particular, we demonstrate significant improvements can be obtained in both word error rate and computational complexity.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1063-6676 1558-2353
DOI:	10.1109/TSA.2005.851965