Spoof Detection using Voice Contribution on LFCC features and ResNet-34

Biometric authentication, especially in speaker verification, has seen significant advancements recently. Despite these significant strides, compelling evidence highlights the ongoing vulnerability to spoofing attacks, requiring specialized countermeasures to detect various attack types. This paper...

Full description

Saved in:
Bibliographic Details
Published in2023 18th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP) pp. 1 - 6
Main Authors Mon, Khaing Zar, Galajit, Kasorn, Mawalim, Candy Olivia, Karnjana, Jessada, Isshiki, Tsuyoshi, Aimmanee, Pakinee
Format Conference Proceeding
LanguageEnglish
Published IEEE 27.11.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Biometric authentication, especially in speaker verification, has seen significant advancements recently. Despite these significant strides, compelling evidence highlights the ongoing vulnerability to spoofing attacks, requiring specialized countermeasures to detect various attack types. This paper specifically focuses on detecting replay, speech synthesis, and voice conversion attacks. In our spoof detection strategy, we employed linear frequency cepstral coefficients (LFCC) for front-end feature extraction and ResNet-34 for distinguishing between genuine and fake speech. By integrating LFCC with ResNet-34, we evaluated the proposed method using the ASVspoof 2019 dataset, PA (Physical Access), and LA (Logical Access). In our approach, we contrast using the entire utterance for feature extraction in both PA and LA datasets with an alternative method that extracts features from a specific percentage of the voice segment within the utterance for classification. In addition, we conducted a comprehensive evaluation by comparing our proposed method with the established baseline techniques, LFCC-GMM and CQCC-GMM. The proposed method demonstrates promising performance with an equal error rate (EER) of 3.11% and 3.49% for replay attacks (PA) in the development and evaluation datasets. For voice conversion and speech synthesis attacks (LA), the method achieves EERs of 0.16% in the development dataset and 6.89% in the evaluation dataset. The proposed method shows promising results in identifying spoof attacks for both PA and LA attacks.
ISSN:2831-4565
DOI:10.1109/iSAI-NLP60301.2023.10354625