Robust speech recognition by integrating speech separation and hypothesis testing
Missing-data methods attempt to improve robust speech recognition by distinguishing between reliable and unreliable data in the time–frequency ( T– F) domain. Such methods require a binary mask to label speech-dominant T– F regions of a noisy speech signal as reliable and the rest as unreliable. Cur...
Saved in:
Published in | Speech communication Vol. 52; no. 1; pp. 72 - 81 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
Amsterdam
Elsevier B.V
2010
Elsevier |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Missing-data methods attempt to improve robust speech recognition by distinguishing between reliable and unreliable data in the time–frequency (
T–
F) domain. Such methods require a binary mask to label speech-dominant
T–
F regions of a noisy speech signal as reliable and the rest as unreliable. Current methods for computing the mask are based mainly on bottom-up cues such as harmonicity and produce labeling errors that degrade recognition performance. In this paper, we propose a two-stage recognition system that combines bottom-up and top-down cues in order to simultaneously improve both mask estimation and recognition accuracy. First, an
n-best lattice consistent with a speech separation mask is generated. The lattice is then re-scored by expanding the mask using a model-based hypothesis test to determine the reliability of individual
T–
F units. Systematic evaluations of the proposed system show significant improvement in recognition performance compared to that using speech separation alone. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 |
ISSN: | 0167-6393 1872-7182 |
DOI: | 10.1016/j.specom.2009.08.008 |