Speaker de-identification via voice transformation

It is a common feature of modern automated voice-driven applications and services to record and transmit a user's spoken request. At the same time, several domains and applications may require keeping the content of the user's request confidential and at the same time preserving the speake...

Full description

Saved in:
Bibliographic Details
Published in2009 IEEE Workshop on Automatic Speech Recognition & Understanding pp. 529 - 533
Main Authors Qin Jin, Toth, A.R., Schultz, T., Black, A.W.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.12.2009
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:It is a common feature of modern automated voice-driven applications and services to record and transmit a user's spoken request. At the same time, several domains and applications may require keeping the content of the user's request confidential and at the same time preserving the speaker's identity. This requires a technology that allows the speaker's voice to be de-identified in the sense that the voice sounds natural and intelligible but does not reveal the identity of the speaker. In this paper we investigate different voice transformation strategies on a large population of speakers to disguise the speakers' identities while preserving the intelligibility of the voices. We apply two automatic speaker identification approaches to verify the success of de-identification with voice transformation, a GMM-based and a phonetic approach. The evaluation based on the automatic speaker identification systems verifies that the proposed voice transformation technique enables transmission of the content of the users' spoken requests while successfully preserving their identities. Also, the results indicate that different speakers still sound distinct after the transformation. Furthermore, we carried out a human listening test that proved the transformed speech to be both intelligible and securely de-identified, as it hid the identity of the speakers even to listeners who knew the speakers very well.
ISBN:1424454786
9781424454785
DOI:10.1109/ASRU.2009.5373356