Deepfake-speech Detection with Pathological Features and Multilayer Perceptron Neural Network

Deepfake speech, a misuse of speech technology, is of great concern since it seems natural and is difficult to detect. Although many methods using various speech features have been proposed, deepfake-speech detection accuracy must be improved, especially in real-world scenarios. Therefore, this pape...

Full description

Saved in:
Bibliographic Details
Published in2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) pp. 2182 - 2188
Main Authors Chaiwongyen, Anuwat, Duangpummet, Suradej, Karnjana, Jessada, Kongprawechnon, Waree, Unoki, Masashi
Format Conference Proceeding
LanguageEnglish
Published IEEE 31.10.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:Deepfake speech, a misuse of speech technology, is of great concern since it seems natural and is difficult to detect. Although many methods using various speech features have been proposed, deepfake-speech detection accuracy must be improved, especially in real-world scenarios. Therefore, this paper presents a method for detecting deepfake speech on the basis of pathological features used by pathologists for assessing voice quality. The six-pathological features, including jitter, shimmer, harmonics-to-noise ratio, cepstral-harmonics-to-noise ratio, normalized noise energy, and glottal-to-noise excitation ratio, are fed to a multilayer perceptron neural network. We evaluated the proposed method using the Audio Deep Synthesis Detection Challenge dataset. The results indicate that the proposed model can be used for detecting deepfake speech. The proposed method's accuracy, precision, recall, and F1-score were over 98% on the development set, and it outperformed the baseline method on the adaptation set.
ISSN:2640-0103
DOI:10.1109/APSIPAASC58517.2023.10317331