Audio Tampering Detection Based on Shallow and Deep Feature Representation Learning

Digital audio tampering detection can be used to verify the authenticity of digital audio. However, most current methods use standard electronic network frequency (ENF) databases for visual comparison analysis of ENF continuity of digital audio or perform feature extraction for classification by mac...

Full description

Saved in:

Bibliographic Details
Published in	arXiv.org
Main Authors	Wang, Zhifeng, Yang, Yao, Zeng, Chunyan, Kong, Shuai, Feng, Shixiong, Zhao, Nan
Format	Paper
Language	English
Published	Ithaca Cornell University Library, arXiv.org 19.10.2022
Subjects	Accuracy Feature extraction Machine learning Representations
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Digital audio tampering detection can be used to verify the authenticity of digital audio. However, most current methods use standard electronic network frequency (ENF) databases for visual comparison analysis of ENF continuity of digital audio or perform feature extraction for classification by machine learning methods. ENF databases are usually tricky to obtain, visual methods have weak feature representation, and machine learning methods have more information loss in features, resulting in low detection accuracy. This paper proposes a fusion method of shallow and deep features to fully use ENF information by exploiting the complementary nature of features at different levels to more accurately describe the changes in inconsistency produced by tampering operations to raw digital audio. The method achieves 97.03% accuracy on three classic databases: Carioca 1, Carioca 2, and New Spanish. In addition, we have achieved an accuracy of 88.31% on the newly constructed database GAUDI-DI. Experimental results show that the proposed method is superior to the state-of-the-art method.
ISSN:	2331-8422