Feature selection for speaker verification using genetic programming

We present a study examining feature selection from high performing models evolved using genetic programming (GP) on the problem of automatic speaker verification (ASV). ASV is a highly unbalanced binary classification problem in which a given speaker must be verified against everyone else. We evolv...

Full description

Saved in:
Bibliographic Details
Published inEvolutionary intelligence Vol. 10; no. 1-2; pp. 1 - 21
Main Authors Loughran, Róisín, Agapitos, Alexandros, Kattan, Ahmed, Brabazon, Anthony, O’Neill, Michael
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.07.2017
Springer Nature B.V
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:We present a study examining feature selection from high performing models evolved using genetic programming (GP) on the problem of automatic speaker verification (ASV). ASV is a highly unbalanced binary classification problem in which a given speaker must be verified against everyone else. We evolve classification models for 10 individual speakers using a variety of fitness functions and data sampling techniques and examine the generalisation of each model on a 1:9 unbalanced set. A significant difference between train and test performance is found which may indicate overfitting in the models. Using only the best generalising models, we examine two methods for selecting the most important features. We compare the performance of a number of tuned machine learning classifiers using the full 275 features and a reduced set of 20 features from both feature selection methods. Results show that using only the top 20 features found in high performing GP programs led to test classifications that are as good as, or better than, those obtained using all data in the majority of experiments undertaken. The classification accuracy between speakers varies considerably across all experiments showing that some speakers are easier to classify than others. This indicates that in such real-world classification problems, the content and quality of the original data has a very high influence on the quality of results obtainable.
ISSN:1864-5909
1864-5917
DOI:10.1007/s12065-016-0150-5