Deepfake source detection in a heart beat

Fake portrait video generation techniques are posing a new threat to society as photorealistic deepfakes are being used for political propaganda, celebrity imitation, forged pieces of evidences, and other identity-related manipulations. Despite these generation techniques, some detection approaches...

Full description

Saved in:
Bibliographic Details
Published inThe Visual computer Vol. 40; no. 4; pp. 2733 - 2750
Main Authors Çiftçi, Umur Aybars, Demir, İlke, Yin, Lijun
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.04.2024
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Fake portrait video generation techniques are posing a new threat to society as photorealistic deepfakes are being used for political propaganda, celebrity imitation, forged pieces of evidences, and other identity-related manipulations. Despite these generation techniques, some detection approaches have also been proven useful due to their high classification accuracy. Nevertheless, almost no effort has been spent tracking down the source of deepfakes. We propose an approach not only to separate deepfakes from real videos, but also to discover the specific generative model behind deepfakes. Some pure deep learning-based approaches try to classify deepfakes using CNNs which actually learn the residuals of the generator. Our key observation is that the spatiotemporal patterns in biological signals can be conceived as a representative projection of the residuals. To justify this observation, we extract PPG cells from real and fake videos and feed these to a state-of-the-art classification network, with an attempt to detect which generative model was used to create a certain fake video. Our results indicate that our approach can detect fake videos with 97.29% accuracy and the source model with 93.39% accuracy. We further evaluate and compare our approach on six datasets to assess its expansibility with new models and generalizability across skin tones and genders, run ablation studies for various components, and analyze its robustness toward compression, landmark noise, and postprocessing operations. The experiments show the superior performance of our proposed approach as compared to the state of the art.
ISSN:0178-2789
1432-2315
DOI:10.1007/s00371-023-02981-0