Maskmark: Robust Neuralwatermarking for Real and Synthetic Speech

High-quality speech synthesis models may be used to spread misinformation or impersonate voices. Audio watermarking can combat misuse by embedding a traceable signature in generated audio. However, existing audio watermarks typically demonstrate robustness to only a small set of transformations of t...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the ... IEEE International Conference on Acoustics, Speech and Signal Processing (1998) pp. 4650 - 4654
Main Authors O'Reilly, Patrick, Jin, Zeyu, Su, Jiaqi, Pardo, Bryan
Format Conference Proceeding
LanguageEnglish
Published IEEE 14.04.2024
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:High-quality speech synthesis models may be used to spread misinformation or impersonate voices. Audio watermarking can combat misuse by embedding a traceable signature in generated audio. However, existing audio watermarks typically demonstrate robustness to only a small set of transformations of the watermarked audio. To address this, we propose MaskMark, a neural network-based digital audio watermarking technique optimized for speech. MaskMark embeds a secret key vector in audio via a multiplicative spectrogram mask, allowing the detection of watermarked speech segments even under substantial signal-processing or neural network-based transformations. Comparisons to a state-of-the-art baseline on natural and synthetic speech corpora and a human subjects evaluation demonstrate MaskMark's superior robustness in detecting watermarked speech while maintaining high perceptual transparency.
ISSN:2379-190X
DOI:10.1109/ICASSP48485.2024.10447253