A comparative study of explicit and implicit modelling of subsegmental speaker-specific excitation source information

In this paper, the explicit and implicit modelling of the subsegmental excitation information are experimentally compared. For explicit modelling, the static and dynamic values of the standard Liljencrants–Fant (LF) parameters that model the glottal flow derivative (GFD) are used. A simplified appro...

Full description

Saved in:

Bibliographic Details
Published in	Sadhana (Bangalore) Vol. 38; no. 4; pp. 591 - 620
Main Authors	PATI, DEBADATTA, MAHADEVA PRASANNA, S R
Format	Journal Article
Language	English
Published	India Springer India 01.08.2013
Subjects	Derivatives Engineering explicit implicit Speaker-specific excitation source information LP residual subsegmental LF model
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, the explicit and implicit modelling of the subsegmental excitation information are experimentally compared. For explicit modelling, the static and dynamic values of the standard Liljencrants–Fant (LF) parameters that model the glottal flow derivative (GFD) are used. A simplified approximation method is proposed to compute these LF parameters by locating the glottal closing and opening instants. The proposed approach significantly reduces the computation needed to implement the LF model. For implicit modelling, linear prediction (LP) residual samples considered in blocks of 5 ms with shift of 2.5 ms are used. Different speaker recognition studies are performed using NIST-99 and NIST-03 databases. In case of speaker identification, the implicit modelling provides significantly better performance compared to explicit modelling. Alternatively, the explicit modelling seem to be providing better performance in case of speaker verification. This indicates that explicit modelling seem to have relatively less intra and inter-speaker variability. The implicit modelling on the other hand, has more intra and inter-speaker variability. What is desirable is less intra and more inter-speaker variability. Therefore, for speaker verification task explicit modelling may be used and for speaker identification task implicit modelling may be used. Further, for both speaker identification and verification tasks the explicit modelling provides relatively more complimentary information to the state-of-the-art vocal tract features. The contribution of the explicit features is relatively more robust against noise. We suggest that the explicit approach can be used to model the subsegmental excitation information for speaker recognition.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23
ISSN:	0256-2499 0973-7677
DOI:	10.1007/s12046-013-0163-z