Human language identification with reduced segmental information

We conducted human language identification experiments using signals with reduced segmental information with Japanese and bilingual subjects. American English and Japanese excerpts from the OGI Multi-Language Telephone Speech Corpus were processed by spectral-envelope removal (SER), vowel extraction...

Full description

Saved in:

Bibliographic Details
Published in	Acoustical Science and Technology Vol. 23; no. 3; pp. 143 - 153
Main Authors	Mori, Kazuya, Murahara, Yuji, Arai, Takayuki, Komatsu, Masahiko, Aoyagi, Makiko
Format	Journal Article
Language	English
Published	Tokyo ACOUSTICAL SOCIETY OF JAPAN 01.05.2002 Japan Science and Technology Agency
Subjects	Audition Human perception Language identification Linguistics OGI Multi-Language Telephone Speech Corpus Prosody Segmentals Speech analysis Suprasegmentals White noise
Online Access	Get full text
ISSN	1346-3969 1347-5177
DOI	10.1250/ast.23.143

Cover

More Information
Summary:	We conducted human language identification experiments using signals with reduced segmental information with Japanese and bilingual subjects. American English and Japanese excerpts from the OGI Multi-Language Telephone Speech Corpus were processed by spectral-envelope removal (SER), vowel extraction from SER (VES) and temporal-envelope modulation (TEM). The processed excerpts of speech were provided as stimuli for perceptual experiments. We calculated D indices from the subjects’ responses, ranging from -2 to +2 where positive/negative values indicate correct/incorrect responses, respectively. With the SER signal, where the spectral-envelope is eliminated, humans could still identify the languages fairly successfully. The overall D index of Japanese subjects for this signal was +1.17. With the VES signal, which retains only vowel sections of the SER signal, the D index was lower (+0.35). With the TEM signal, composed of white-noise-driven intensity envelopes from several frequency bands, the D index rose from +0.29 to +1.69 corresponding to the increasing number of bands from 1 to 4. Results varied depending on the stimulus language. Japanese and bilingual subjects scored differently from each other. These results indicate that humans can identify languages using signals with drastically reduced segmental information. The results also suggest variation due to the phonetic typologies of languages and subjects’ knowledge.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 content type line 23
ISSN:	1346-3969 1347-5177
DOI:	10.1250/ast.23.143