Human language identification with reduced segmental information
We conducted human language identification experiments using signals with reduced segmental information with Japanese and bilingual subjects. American English and Japanese excerpts from the OGI Multi-Language Telephone Speech Corpus were processed by spectral-envelope removal (SER), vowel extraction...
Saved in:
Published in | Acoustical Science and Technology Vol. 23; no. 3; pp. 143 - 153 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Tokyo
ACOUSTICAL SOCIETY OF JAPAN
01.05.2002
Japan Science and Technology Agency |
Subjects | |
Online Access | Get full text |
ISSN | 1346-3969 1347-5177 |
DOI | 10.1250/ast.23.143 |
Cover
Summary: | We conducted human language identification experiments using signals with reduced segmental information with Japanese and bilingual subjects. American English and Japanese excerpts from the OGI Multi-Language Telephone Speech Corpus were processed by spectral-envelope removal (SER), vowel extraction from SER (VES) and temporal-envelope modulation (TEM). The processed excerpts of speech were provided as stimuli for perceptual experiments. We calculated D indices from the subjects’ responses, ranging from -2 to +2 where positive/negative values indicate correct/incorrect responses, respectively. With the SER signal, where the spectral-envelope is eliminated, humans could still identify the languages fairly successfully. The overall D index of Japanese subjects for this signal was +1.17. With the VES signal, which retains only vowel sections of the SER signal, the D index was lower (+0.35). With the TEM signal, composed of white-noise-driven intensity envelopes from several frequency bands, the D index rose from +0.29 to +1.69 corresponding to the increasing number of bands from 1 to 4. Results varied depending on the stimulus language. Japanese and bilingual subjects scored differently from each other. These results indicate that humans can identify languages using signals with drastically reduced segmental information. The results also suggest variation due to the phonetic typologies of languages and subjects’ knowledge. |
---|---|
Bibliography: | ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 content type line 23 |
ISSN: | 1346-3969 1347-5177 |
DOI: | 10.1250/ast.23.143 |