Fake speech detection using VGGish with attention block
While deep learning technologies have made remarkable progress in generating deepfakes, their misuse has become a well-known concern. As a result, the ubiquitous usage of deepfakes for increasing false information poses significant risks to the security and privacy of individuals. The primary object...
Saved in:
Published in | EURASIP journal on audio, speech, and music processing Vol. 2024; no. 1; pp. 35 - 19 |
---|---|
Main Authors | , , , , |
Format | Journal Article |
Language | English |
Published |
Cham
Springer International Publishing
26.06.2024
Springer Nature B.V SpringerOpen |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | While deep learning technologies have made remarkable progress in generating deepfakes, their misuse has become a well-known concern. As a result, the ubiquitous usage of deepfakes for increasing false information poses significant risks to the security and privacy of individuals. The primary objective of audio spoofing detection is to identify audio generated through numerous AI-based techniques. Several techniques for fake audio detection already exist using machine learning algorithms. However, they lack generalization and may not identify all types of AI-synthesized audios such as replay attacks, voice conversion, and text-to-speech (TTS). In this paper, a deep layered model, i.e., VGGish, along with an attention block, namely Convolutional Block Attention Module (CBAM) for spoofing detection, is introduced. Our suggested model successfully classifies input audio into two classes: Fake and Real, converting them into mel-spectrograms, and extracting their most representative features due to the attention block. Our model is a significant technique to utilize for audio spoofing detection due to a simple layered architecture. It captures complex relationships in audio signals due to both spatial and channel features present in an attention module. To evaluate the effectiveness of our model, we have conducted in-depth testing using the ASVspoof 2019 dataset. The proposed technique achieved an EER of 0.52% for Physical Access (PA) attacks and 0.07 % for Logical Access (LA) attacks. |
---|---|
ISSN: | 1687-4722 1687-4714 1687-4722 |
DOI: | 10.1186/s13636-024-00348-4 |