Bayesian Speaker Adaptation Based on a New Hierarchical Probabilistic Model

In this paper, a new hierarchical Bayesian speaker adaptation method called HMAP is proposed that combines the advantages of three conventional algorithms, maximum a posteriori (MAP), maximum-likelihood linear regression (MLLR), and eigenvoice, resulting in excellent performance across a wide range...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on audio, speech, and language processing Vol. 20; no. 7; pp. 2002 - 2015
Main Authors	Wen-Lin Zhang, Wei-Qiang Zhang, Bi-Cheng Li, Dan Qu, Johnson, M. T.
Format	Journal Article
Language	English
Published	Piscataway, NJ IEEE 01.09.2012 Institute of Electrical and Electronics Engineers
Subjects	Adaptation Adaptation models Applied sciences Bayesian analysis Correlation Eigenphones eigenvoices Exact sciences and technology Hidden Markov models hierarchical model Information, signal and communications theory Mathematical models maximum a posteriori (MAP) Principal component analysis Probabilistic logic Probability theory Signal processing speaker adaptation Speech Speech processing Subspaces Telecommunications and information theory Telephones Training Vectors Performance evaluation Probabilistic approach Linear regression Subspace method Speaker adaptation Algorithm Modeling Eigenphones Statistical method maximum a posteriori (MAP) A posteriori estimation Batch process Speech recognition On line processing Chinese Probabilistic model hierarchical model Maximum likelihood Bayes methods Hierarchical system Speech processing eigenvoices Principal component analysis
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, a new hierarchical Bayesian speaker adaptation method called HMAP is proposed that combines the advantages of three conventional algorithms, maximum a posteriori (MAP), maximum-likelihood linear regression (MLLR), and eigenvoice, resulting in excellent performance across a wide range of adaptation conditions. The new method efficiently utilizes intra-speaker and inter-speaker correlation information through modeling phone and speaker subspaces in a consistent hierarchical Bayesian way. The phone variations for a specific speaker are assumed to be located in a low-dimensional subspace. The phone coordinate, which is shared among different speakers, implicitly contains the intra-speaker correlation information. For a specific speaker, the phone variation, represented by speaker-dependent eigenphones, are concatenated into a supervector. The eigenphone supervector space is also a low dimensional speaker subspace, which contains inter-speaker correlation information. Using principal component analysis (PCA), a new hierarchical probabilistic model for the generation of the speech observations is obtained. Speaker adaptation based on the new hierarchical model is derived using the maximum a posteriori criterion in a top-down manner. Both batch adaptation and online adaptation schemes are proposed. With tuned parameters, the new method can handle varying amounts of adaptation data automatically and efficiently. Experimental results on a Mandarin Chinese continuous speech recognition task show good performance under all testing conditions.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	1558-7916 1558-7924
DOI:	10.1109/TASL.2012.2193390