Applied principles of clear and Lombard speech for automated intelligibility enhancement in noisy environments

Previous studies have documented phenomena involving the modification of human speech in special communication circumstances. Whether speaking to a hearing-impaired person (clear speech) or in a noisy environment (Lombard speech), speakers tend to make similar modifications to their normal, conversa...

Full description

Saved in:

Bibliographic Details
Published in	Speech communication Vol. 48; no. 5; pp. 549 - 558
Main Authors	Skowronski, Mark D., Harris, John G.
Format	Journal Article
Language	English
Published	Amsterdam Elsevier B.V 01.05.2006 Elsevier
Subjects	Applied sciences Clear speech Detection, estimation, filtering, equalization, prediction Energy redistribution Exact sciences and technology Information, signal and communications theory Signal and communications theory Signal processing Signal, noise Speech enhancement Speech processing Telecommunications and information theory 43.60.Dh Clear speech Speech enhancement 43.71.Es 43.72.Ew Energy redistribution Performance evaluation Vocabulary Speech analysis Segmentation Speech intelligibility Verbal perception 43.71.Es Clear speech Background noise High pass filter Consonant Accuracy Vowel Telecommunication system Masking Learning algorithm Speech processing
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Previous studies have documented phenomena involving the modification of human speech in special communication circumstances. Whether speaking to a hearing-impaired person (clear speech) or in a noisy environment (Lombard speech), speakers tend to make similar modifications to their normal, conversational speaking style in order to increase the understanding of their message by the listener. One strategy characteristic of the above speech types is to increase consonant power relative to the signal power of adjacent vowels and is referred to as consonant–vowel (CV) ratio boosting. An automated method of speech enhancement using CV ratio boosting is called energy redistribution voiced/unvoiced (ERVU). To characterize the performance of ERVU, 25 listeners responded to 500 words in a two-word, forced-choice experiment in the presence of energetic masking noise. The test material was a vocabulary of confusable monosyllabic words spoken by 8 male and 8 female speakers, and the conditions tested were a control (unmodified speech), ERVU, and a high-pass filter (HPF). Both ERVU and the HPF significantly increased recognition accuracy compared to the control. Nine of the 16 speakers were significantly more intelligible when ERVU or the HPF was used, compared to the control, while no speaker was less intelligible. The results show that ERVU successfully increased intelligibility of speech using a simple automated segmentation algorithm, applicable to a wide variety of communication systems such as cell phones and public address systems.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 ObjectType-Article-1 ObjectType-Feature-2
ISSN:	0167-6393 1872-7182
DOI:	10.1016/j.specom.2005.09.003