Hybrid DWT and MFCC feature warping for noisy forensic speaker verification in room reverberation

The robustness of speaker verification systems is often degraded in real forensic applications, which contain environmental noise and reverberation. Reverberation results in mismatched conditions between enrolment and test speech signals. In this work, we investigate the effectiveness of combining f...

Full description

Saved in:

Bibliographic Details
Published in	2017 IEEE International Conference on Signal and Image Processing Applications (ICSIPA) pp. 434 - 439
Main Authors	Al-Ali, Ahmed Kamil Hasan, Senadji, Bouchra, Chandran, Vinod
Format	Conference Proceeding
Language	English
Published	IEEE 01.09.2017
Subjects	Discrete wavelet transforms DWT Feature extraction feature-warped MFCC Forensics Mel frequency cepstral coefficient Noise measurement Noisy forensic speaker verification Reverberation Reverberation conditions Speech
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The robustness of speaker verification systems is often degraded in real forensic applications, which contain environmental noise and reverberation. Reverberation results in mismatched conditions between enrolment and test speech signals. In this work, we investigate the effectiveness of combining features of discrete wavelet transform (DWT) and feature-warped mel frequency cepstral coefficients (MFCCs) to improve the performance of speaker verification under conditions of reverberation and environmental noises. State of the art intermediate vector (i-vector) and probabilistic linear discriminant analysis (PLDA) were used as a classifier. The algorithm was evaluated by convolving the impulse room response with enrolment speech from an Australian forensic voice comparison database. The test speech signals were combined with car, street, and home noises from the QUT-NOISE database at signal to noise ratios (SNR) ranging from -10 dB to 10 dB. Experimental results indicate that the algorithm achieves a reduction in average equal error rate (EER) ranging from 17.10% to 51.86% over traditional MFCC features when reverberated enrolment data and the test speech signals are corrupted with car, street and home noises at SNRs ranging from -10 dB to 10 dB.
DOI:	10.1109/ICSIPA.2017.8120650