Mutual Information Based Dynamic Integration of Multiple Feature Streams for Robust Real-Time LVCSR

We present a novel method of integrating the likelihoods of multiple feature streams, representing different acoustic aspects, for robust speech recognition. The integration algorithm dynamically calculates a frame-wise stream weight so that a higher weight is given to a stream that is robust to a v...

Full description

Saved in:

Bibliographic Details
Published in	IEICE Transactions on Information and Systems Vol. E91.D; no. 3; pp. 815 - 824
Main Authors	SATO, Shoei, KOBAYASHI, Akio, ONOE, Kazuo, HOMMA, Shinichi, IMAI, Toru, TAKAGI, Tohru, KOBAYASHI, Tetsunori
Format	Journal Article
Language	English
Published	Oxford The Institute of Electronics, Information and Communication Engineers 2008 Oxford University Press
Subjects	active hypotheses Applied sciences Artificial intelligence Computer science; control theory; systems Dynamical systems Dynamics Entropy Exact sciences and technology Information, signal and communications theory Mathematical analysis Modulation, demodulation mutual information Real time Searching Signal and communications theory Signal processing Speech and sound recognition and synthesis. Linguistics Speech processing Speech recognition stream integration Streams Systems, networks and services of telecommunications Telecommunications Telecommunications and information theory Transmission and modulation (techniques and equipments) Discriminant analysis Information integration Frequency modulation Probabilistic approach active hypotheses Amplitude modulation Entropy Cable television Background noise Algorithm Japanese stream integration Weighting Audiovisual document Resonance frequency Hidden Markov models News Speech recognition Frequency drift Signal processing Feature extraction Speech processing Mutual information
Online Access	Get full text

Cover

Loading…

More Information
Summary:	We present a novel method of integrating the likelihoods of multiple feature streams, representing different acoustic aspects, for robust speech recognition. The integration algorithm dynamically calculates a frame-wise stream weight so that a higher weight is given to a stream that is robust to a variety of noisy environments or speaking styles. Such a robust stream is expected to show discriminative ability. A conventional method proposed for the recognition of spoken digits calculates the weights front the entropy of the whole set of HMM states. This paper extends the dynamic weighting to a real-time large-vocabulary continuous speech recognition (LVCSR) system. The proposed weight is calculated in real-time from mutual information between an input stream and active HMM states in a searchs pace without an additional likelihood calculation. Furthermore, the mutual information takes the width of the search space into account by calculating the marginal entropy from the number of active states. In this paper, we integrate three features that are extracted through auditory filters by taking into account the human auditory system's ability to extract amplitude and frequency modulations. Due to this, features representing energy, amplitude drift, and resonant frequency drifts, are integrated. These features are expected to provide complementary clues for speech recognition. Speech recognition experiments on field reports and spontaneous commentary from Japanese broadcast news showed that the proposed method reduced error words by 9.2% in field reports and 4.7% in spontaneous commentaries relative to the best result obtained from a single stream.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23
ISSN:	0916-8532 1745-1361 1745-1361
DOI:	10.1093/ietisy/e91-d.3.815