Deepfake-speech Detection with Pathological Features and Multilayer Perceptron Neural Network
Deepfake speech, a misuse of speech technology, is of great concern since it seems natural and is difficult to detect. Although many methods using various speech features have been proposed, deepfake-speech detection accuracy must be improved, especially in real-world scenarios. Therefore, this pape...
Saved in:
Published in | 2023 Asia Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA ASC) pp. 2182 - 2188 |
---|---|
Main Authors | , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
31.10.2023
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Deepfake speech, a misuse of speech technology, is of great concern since it seems natural and is difficult to detect. Although many methods using various speech features have been proposed, deepfake-speech detection accuracy must be improved, especially in real-world scenarios. Therefore, this paper presents a method for detecting deepfake speech on the basis of pathological features used by pathologists for assessing voice quality. The six-pathological features, including jitter, shimmer, harmonics-to-noise ratio, cepstral-harmonics-to-noise ratio, normalized noise energy, and glottal-to-noise excitation ratio, are fed to a multilayer perceptron neural network. We evaluated the proposed method using the Audio Deep Synthesis Detection Challenge dataset. The results indicate that the proposed model can be used for detecting deepfake speech. The proposed method's accuracy, precision, recall, and F1-score were over 98% on the development set, and it outperformed the baseline method on the adaptation set. |
---|---|
ISSN: | 2640-0103 |
DOI: | 10.1109/APSIPAASC58517.2023.10317331 |