Replay spoofing countermeasures using high spectro-temporal resolution features

The easy implementation of replay attacks by a fraudster poses a severe threat to automatic speaker verification (ASV) technology than the other spoofing attacks like speech synthesis and voice conversion. Replay attacks refer to an attack by a fraudster to get illegitimate access to an ASV system b...

Full description

Saved in:

Bibliographic Details
Published in	International journal of speech technology Vol. 22; no. 1; pp. 271 - 281
Main Authors	Alluri, K. N. R. K. Raju, Vuppala, Anil Kumar
Format	Journal Article
Language	English
Published	New York Springer US 15.03.2019 Springer Nature B.V
Subjects	Artificial Intelligence Cepstral analysis Cues Engineering Error analysis Feature extraction Probabilistic models Signal,Image and Speech Processing Social Sciences Speech recognition Speech synthesis Spoofing Temporal resolution Time-frequency analysis Replay attacks Spoofing counter measures Deep neural networks Single frequency filtering Zero time windowing Gaussian mixture models Automatic speaker recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The easy implementation of replay attacks by a fraudster poses a severe threat to automatic speaker verification (ASV) technology than the other spoofing attacks like speech synthesis and voice conversion. Replay attacks refer to an attack by a fraudster to get illegitimate access to an ASV system by playing back the speech sample collected from genuine target speaker. The significant cues that can differentiate between genuine and replay recordings are channel characteristics. To capture these characteristics, one need to extract features from the spectrum, which should have high spectral and temporal resolutions. Zero time windowing (ZTW) analysis of speech is one such time-frequency analysis technique, which results in high spectral and temporal resolution spectrum at each sampling instant. In this study, new features are proposed by applying cepstral analysis to ZTW spectrum. Experiments are performed on two publicly available replay attack databases namely BTAS 2016 and ASVspoof 2017. The first set of experiments are conducted using Gaussian mixture models to evaluate the potential of proposed features. Performance of the proposed system in terms of half total error rate is 0.75% and in terms of equal error rate is 14.75% on BTAS 2016 and ASVspoof 2017 evaluation sets respectively. A score level fusion is performed by using proposed features with previously proposed single frequency filtering cepstral coefficients. This fused result outperformed the previously reported best results on these two datasets.
ISSN:	1381-2416 1572-8110
DOI:	10.1007/s10772-019-09602-z