Short-time speaker verification with different speaking style utterances

In recent years, great progress has been made in the technical aspects of automatic speaker verification (ASV). However, the promotion of ASV technology is still a very challenging issue, because most technologies are still very sensitive to new, unknown and spoofing conditions. Most previous studie...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 15; no. 11; p. e0241809
Main Authors	Mao, Hongwei, Shi, Yan, Liu, Yue, Wei, Linqiang, Li, Yijie, Long, Yanhua
Format	Journal Article
Language	English
Published	United States Public Library of Science 11.11.2020 Public Library of Science (PLoS)
Subjects	Access control Audio equipment Biology and Life Sciences Computer and Information Sciences Engineering and Technology Human-computer interaction Humans Identification and classification Laboratories Methods Normal Distribution Physical Sciences Probabilistic models Public speakers Reading Singing Social Sciences Speaking Speech Speech Acoustics Speech acts (Linguistics) Speech Perception Speech recognition Spoofing Students Verification Verification (Logic) Websites China
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In recent years, great progress has been made in the technical aspects of automatic speaker verification (ASV). However, the promotion of ASV technology is still a very challenging issue, because most technologies are still very sensitive to new, unknown and spoofing conditions. Most previous studies focused on extracting target speaker information from natural speech. This paper aims to design a new ASV corpus with multi-speaking styles and investigate the ASV robustness to these different speaking styles. We first release this corpus in the Zenodo website for public research, in which each speaker has several text-dependent and text-independent singing, humming and normal reading speech utterances. Then, we investigate the speaker discrimination of each speaking style in the feature space. Furthermore, the intra and inter-speaker variabilities in each different speaking style and cross-speaking styles are investigated in both text-dependent and text-independent ASV tasks. Conventional Gaussian Mixture Model (GMM), and the state-of-the-art x-vector are used to build ASV systems. Experimental results show that the voiceprint information in humming and singing speech are more distinguishable than that in normal reading speech for conventional ASV systems. Furthermore, we find that combing the three speaking styles can significantly improve the x-vector based ASV system, even when only limited gains are obtained by conventional GMM-based systems.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Competing Interests: The affiliation ‘Unisound AI Technology Co., Ltd., Beijing, China’ is not a funder of this work. Yijie Li is a staff of this company, his contribution in this work is experimental results analysis and provided some suggestion during our paper writing. There are no patents, products in development or marketed products to declare. This does not alter our adherence to PLOS ONE policies on sharing data and materials.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0241809