Every Breath You Don't Take: Deepfake Speech Detection Using Breath
Deepfake speech represents a real and growing threat to systems and society. Many detectors have been created to aid in defense against speech deepfakes. While these detectors implement myriad methodologies, many rely on low-level fragments of the speech generation process. We hypothesize that breat...
Saved in:
Main Authors | , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
23.04.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Deepfake speech represents a real and growing threat to systems and society.
Many detectors have been created to aid in defense against speech deepfakes.
While these detectors implement myriad methodologies, many rely on low-level
fragments of the speech generation process. We hypothesize that breath, a
higher-level part of speech, is a key component of natural speech and thus
improper generation in deepfake speech is a performant discriminator. To
evaluate this, we create a breath detector and leverage this against a custom
dataset of online news article audio to discriminate between real/deepfake
speech. Additionally, we make this custom dataset publicly available to
facilitate comparison for future work. Applying our simple breath detector as a
deepfake speech discriminator on in-the-wild samples allows for accurate
classification (perfect 1.0 AUPRC and 0.0 EER on test data) across 33.6 hours
of audio. We compare our model with the state-of-the-art SSL-wav2vec model and
show that this complex deep learning model completely fails to classify the
same in-the-wild samples (0.72 AUPRC and 0.99 EER). |
---|---|
DOI: | 10.48550/arxiv.2404.15143 |