Human language identification with reduced segmental information

We conducted human language identification experiments using signals with reduced segmental information with Japanese and bilingual subjects. American English and Japanese excerpts from the OGI Multi-Language Telephone Speech Corpus were processed by spectral-envelope removal (SER), vowel extraction...

Full description

Saved in:
Bibliographic Details
Published inAcoustical Science and Technology Vol. 23; no. 3; pp. 143 - 153
Main Authors Mori, Kazuya, Murahara, Yuji, Arai, Takayuki, Komatsu, Masahiko, Aoyagi, Makiko
Format Journal Article
LanguageEnglish
Published Tokyo ACOUSTICAL SOCIETY OF JAPAN 01.05.2002
Japan Science and Technology Agency
Subjects
Online AccessGet full text
ISSN1346-3969
1347-5177
DOI10.1250/ast.23.143

Cover

More Information
Summary:We conducted human language identification experiments using signals with reduced segmental information with Japanese and bilingual subjects. American English and Japanese excerpts from the OGI Multi-Language Telephone Speech Corpus were processed by spectral-envelope removal (SER), vowel extraction from SER (VES) and temporal-envelope modulation (TEM). The processed excerpts of speech were provided as stimuli for perceptual experiments. We calculated D indices from the subjects’ responses, ranging from -2 to +2 where positive/negative values indicate correct/incorrect responses, respectively. With the SER signal, where the spectral-envelope is eliminated, humans could still identify the languages fairly successfully. The overall D index of Japanese subjects for this signal was +1.17. With the VES signal, which retains only vowel sections of the SER signal, the D index was lower (+0.35). With the TEM signal, composed of white-noise-driven intensity envelopes from several frequency bands, the D index rose from +0.29 to +1.69 corresponding to the increasing number of bands from 1 to 4. Results varied depending on the stimulus language. Japanese and bilingual subjects scored differently from each other. These results indicate that humans can identify languages using signals with drastically reduced segmental information. The results also suggest variation due to the phonetic typologies of languages and subjects’ knowledge.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 14
content type line 23
ISSN:1346-3969
1347-5177
DOI:10.1250/ast.23.143