Analysis of the Effects of Supraglottal Tract Surgical Procedures in Automatic Speaker Recognition Performance

This article evaluates the impact in the performance of state-of-the-art automatic speaker recognition schemes of three surgical procedures modifying the supraglottal tract structures of speakers. To do so, a new corpus (Cuco) was recorded, containing the speech of 107 speakers before and after surg...

Full description

Saved in:

Bibliographic Details
Published in	IEEE/ACM transactions on audio, speech, and language processing Vol. 28; pp. 798 - 812
Main Authors	Moro-Velaquez, Laureano, Hernandez-Garcia, Estefania, Gomez-Garcia, Jorge A., Godino-Llorente, Juan I., Dehak, Najim
Format	Journal Article
Language	English
Published	Piscataway IEEE 2020 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Artificial neural networks Automatic speaker recognition Discriminant analysis Error analysis Hospitals Pathology Performance evaluation septoplasty sinus surgery Speaker recognition Speech processing Speech recognition Surgery tonsillectomy
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This article evaluates the impact in the performance of state-of-the-art automatic speaker recognition schemes of three surgical procedures modifying the supraglottal tract structures of speakers. To do so, a new corpus (Cuco) was recorded, containing the speech of 107 speakers before and after surgery. Speakers were divided into four groups depending on the type of surgery: tonsillectomy, functional endoscopy sinus surgery (FESS), septoplasty, and controls. The analyzed speaker recognition schemes were i-vectors, i-vectors with supervised Universal Background Model, i-vectors employing Time-delay Deep Neural Networks and x-vectors. In all cases, probabilistic linear discriminant analysis was employed in the back-end. Results show changes in the speech of patients who underwent tonsillectomy or FESS after surgery in contrast to controls or patients who had a septoplasty, where not significant variations are observed. These changes increase the Equal Error Rate (EER) of the analyzed speaker recognition schemes for the septoplasty and FESS groups when employing enrollment data recorded before the surgery. Moreover, surgery has a similar influence in the speech of female and male speakers with respect to the analyzed schemes. In consequence, results suggest that it is advisable to update the speaker's enrollment speech after three months following supraglottal tract surgery to ensure that the effects of the operation and post-operative recovery period do not influence the performance of the automatic speaker recognition systems.
ISSN:	2329-9290 2329-9304
DOI:	10.1109/TASLP.2020.2967567