Mixture linear prediction Gammatone Cepstral features for robust speaker verification under transmission channel noise

In this paper, we present a Mixture Linear Prediction based approach for robust Gammatone Cepstral Coefficients extraction (MLPGCCs). The proposed method provides performance improvement of Automatic Speaker Verification (ASV) using i-vector and Gaussian Probabilistic Linear Discriminant Analysis GP...

Full description

Saved in:

Bibliographic Details
Published in	Multimedia tools and applications Vol. 79; no. 25-26; pp. 18679 - 18693
Main Authors	Krobba, Ahmed, Debyeche, Mohamed, Selouani, Sid-Ahmed
Format	Journal Article
Language	English
Published	New York Springer US 01.07.2020 Springer Nature B.V
Subjects	Channel noise Coefficients Computer Communication Networks Computer Science Data Structures and Information Theory Discriminant analysis Fading Linear prediction Multimedia Information Systems Noise Noise levels Normal distribution Random noise Robustness Special Purpose and Application-Based Systems Speech Verification I-vector GPLDA Transmission channel noise Automatic speaker verification Mixture linear prediction Gammatone Frequency Cepstral Coefficients (GFCCs)
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, we present a Mixture Linear Prediction based approach for robust Gammatone Cepstral Coefficients extraction (MLPGCCs). The proposed method provides performance improvement of Automatic Speaker Verification (ASV) using i-vector and Gaussian Probabilistic Linear Discriminant Analysis GPLDA modeling under transmission channel noise. The performance of the extracted MLPGCCs was evaluated using the NIST 2008 database where a single channel microphone recorded conversational speech. The system is analyzed in the presence of different channel transmission noises such as Additive White Gaussian (AWGN) and Rayleigh fading at various Signals to Noise Ratio (SNR) levels. The evaluation results show that the MLPGCCs features are a promising way for the ASV task. Indeed, the speaker verification performance using the MLPGCCs proposed features is significantly improved compared to the conventional Gammatone Frequency Cepstral Coefficients (GFCCs) and Mel Frequency Cepstral Coefficients (MFCCs) features. For speech signals corrupted with AWGN noise at SNRs ranging from (-5 dB to 15 dB), we obtain a significant reduction of the Equal Error Rate (EER) ranging from 9.41% to 6.65% and 3.72% to 1.50%, compared with conventional MFCCs and GFCCs features respectively. In addition, when the test speech signals are corrupted with Rayleigh fading channel we achieve an EER reduction ranging from 23.63% to 7.8% and from 10.88% to 6.8% compared with conventional MFCCs and GFCCs, respectively. We also found that the combination of GFCCs and MLPGCCs gives the highest performance of speaker verification system. The best performance combination achieved is around EER from 0.43% to 0.59% and 1.92% to 3.88%.
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-020-08748-2