Multimodal Graph Learning for Deepfake Detection
Existing deepfake detectors face several challenges in achieving robustness and generalization. One of the primary reasons is their limited ability to extract relevant information from forgery videos, especially in the presence of various artifacts such as spatial, frequency, temporal, and landmark...
Saved in:
Main Authors | , , , , , , |
---|---|
Format | Journal Article |
Language | English |
Published |
12.09.2022
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Existing deepfake detectors face several challenges in achieving robustness
and generalization. One of the primary reasons is their limited ability to
extract relevant information from forgery videos, especially in the presence of
various artifacts such as spatial, frequency, temporal, and landmark
mismatches. Current detectors rely on pixel-level features that are easily
affected by unknown disturbances or facial landmarks that do not provide
sufficient information. Furthermore, most detectors cannot utilize information
from multiple domains for detection, leading to limited effectiveness in
identifying deepfake videos. To address these limitations, we propose a novel
framework, namely Multimodal Graph Learning (MGL) that leverages information
from multiple modalities using two GNNs and several multimodal fusion modules.
At the frame level, we employ a bi-directional cross-modal transformer and an
adaptive gating mechanism to combine the features from the spatial and
frequency domains with the geometric-enhanced landmark features captured by a
GNN. At the video level, we use a Graph Attention Network (GAT) to represent
each frame in a video as a node in a graph and encode temporal information into
the edges of the graph to extract temporal inconsistency between frames. Our
proposed method aims to effectively identify and utilize distinguishing
features for deepfake detection. We evaluate the effectiveness of our method
through extensive experiments on widely-used benchmarks and demonstrate that
our method outperforms the state-of-the-art detectors in terms of
generalization ability and robustness against unknown disturbances. |
---|---|
DOI: | 10.48550/arxiv.2209.05419 |