Gibberish is All You Need for Membership Inference Detection in Contrastive Language-Audio Pretraining
Audio can disclose PII, particularly when combined with related text data. Therefore, it is essential to develop tools to detect privacy leakage in Contrastive Language-Audio Pretraining(CLAP). Existing MIAs need audio as input, risking exposure of voiceprint and requiring costly shadow models. We f...
Saved in:
Main Authors | , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
23.10.2024
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Audio can disclose PII, particularly when combined with related text data.
Therefore, it is essential to develop tools to detect privacy leakage in
Contrastive Language-Audio Pretraining(CLAP). Existing MIAs need audio as
input, risking exposure of voiceprint and requiring costly shadow models. We
first propose PRMID, a membership inference detector based probability ranking
given by CLAP, which does not require training shadow models but still requires
both audio and text of the individual as input. To address these limitations,
we then propose USMID, a textual unimodal speaker-level membership inference
detector, querying the target model using only text data. We randomly generate
textual gibberish that are clearly not in training dataset. Then we extract
feature vectors from these texts using the CLAP model and train a set of
anomaly detectors on them. During inference, the feature vector of each test
text is input into the anomaly detector to determine if the speaker is in the
training set (anomalous) or not (normal). If available, USMID can further
enhance detection by integrating real audio of the tested speaker. Extensive
experiments on various CLAP model architectures and datasets demonstrate that
USMID outperforms baseline methods using only text data. |
---|---|
DOI: | 10.48550/arxiv.2410.18371 |