Speaker verification in score-ageing-quality classification space

► A speaker ageing database of 18 adults across a 30–60 year time lapse is presented. ► A speaker verification evaluation of this ageing data results in a high error rate. ► The dependency between verification score and ageing progression is analysed. ► Verification score is shown to be correlated w...

Full description

Saved in:
Bibliographic Details
Published inComputer speech & language Vol. 27; no. 5; pp. 1068 - 1084
Main Authors Kelly, Finnian, Drygajlo, Andrzej, Harte, Naomi
Format Journal Article
LanguageEnglish
Published Kidlington Elsevier Ltd 01.08.2013
Elsevier
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:► A speaker ageing database of 18 adults across a 30–60 year time lapse is presented. ► A speaker verification evaluation of this ageing data results in a high error rate. ► The dependency between verification score and ageing progression is analysed. ► Verification score is shown to be correlated with measures of recording quality. ► A score-ageing-quality decision boundary improves significantly over the baseline. A challenge in automatic speaker verification is to create a system that is robust to the effects of vocal ageing. To observe the ageing effect, a speaker's voice must be analysed over a period of time, over which, variation in the quality of the voice samples is likely to be encountered. Thus, in dealing with the ageing problem, the related issue of quality must also be addressed. We present a solution to speaker verification across ageing by using a stacked classifier framework to combine ageing and quality information with the scores of a baseline classifier. In tandem, the Trinity College Dublin Speaker Ageing database of 18 speakers, each covering a 30–60 year time range, is presented. An evaluation of a baseline Gaussian Mixture Model–Universal Background Model (GMM–UBM) system using this database demonstrates a progressive degradation in genuine speaker verification scores as ageing progresses. Consequently, applying a conventional threshold, determined using scores at the time of enrolment, results in poor long-term performance. The influence of quality on verification scores is investigated via a number of quality measures. Alongside established signal-based measures, a new model-based measure, Wnorm, is proposed, and its utility is demonstrated on the CSLU database. Combining ageing information with quality measures and the scores from the GMM–UBM system, a verification decision boundary is created in score-ageing-quality space. The best performance is achieved by using scores and ageing in conjunction with the new Wnorm quality measure, reducing verification error by 45% relative to the baseline. This work represents the first comprehensive analysis of speaker verification on a longitudinal speaker database and successfully addresses the associated variability from ageing and quality arte-facts.
ISSN:0885-2308
1095-8363
DOI:10.1016/j.csl.2012.12.005