A method to compensate the influence of speech codec in speaker recognition
The recognition of a person by his voice or “speaker recognition”, is a biometric specialty increasingly used in electronic commerce and electronic banking transactions and forensic investigations, among others. Speaker recognition is supported by the discriminative information contained in the spee...
Saved in:
Published in | International journal of speech technology Vol. 21; no. 4; pp. 975 - 985 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
New York
Springer US
15.12.2018
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | The recognition of a person by his voice or “speaker recognition”, is a biometric specialty increasingly used in electronic commerce and electronic banking transactions and forensic investigations, among others. Speaker recognition is supported by the discriminative information contained in the speech of a person and its main challenge is the variability that exists between different speech samples of the same person, used for training and evaluation, or “session variability”. When a speech communication is transmitted over the internet, for example, the coding–decoding process “codec” of the speech causes loss of such information and affects the effectiveness of the speaker recognition. Some methods have been proposed to mitigate this effect. This work makes a study of the degree of affectation of this information for some commonly used codec types and proposes our own solution, to compensate the session variability provoked by the codec. The influence of some types of codec in the quality of the sample was evaluated first with a set of synthesized speech samples. Later, experiments were carried out with speech samples of international competitions, retransmitted over two different codecs, and the effect on the speaker recognition effectiveness was checked. Finally, the variability compensation was applied, with an improvement of the recognition effectiveness, measured by the equal error rate, of 20.8% for the g.722 codec and 27.8% for the gsm 6.20 codec. |
---|---|
ISSN: | 1381-2416 1572-8110 |
DOI: | 10.1007/s10772-018-9547-0 |