Integrating Acoustic, Prosodic and Phonotactic Features for Spoken Language Identification

The fundamental issue of the automatic language identification is to explore the effective discriminative cues for languages. This paper studies the fusion of five features at different level of abstraction for language identification, including spectrum, duration, pitch, n-gram phonotactic, and bag...

Full description

Saved in:
Bibliographic Details
Published in2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings Vol. 1; p. I
Main Authors Rong Tong, Bin Ma, Donglai Zhu, Haizhou Li, Eng Siong Chng
Format Conference Proceeding
LanguageEnglish
Published IEEE 2006
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The fundamental issue of the automatic language identification is to explore the effective discriminative cues for languages. This paper studies the fusion of five features at different level of abstraction for language identification, including spectrum, duration, pitch, n-gram phonotactic, and bag-of-sounds features. We build a system and report test results on NIST 1996 and 2003 LRE datasets. The system is also built to participate in NIST 2005 LRE. The experiment results show that different levels of information provide complementary language cues. The prosodic features are more effective for shorter utterances while the phonotactic features work better for longer utterances. For the task of 12 languages, the system with fusion of five features achieved 2.38% EER for 30-sec speech segments on NIST 1996 dataset
ISBN:9781424404698
142440469X
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.2006.1659993