The effects of face mask on speech production and its implication for forensic speaker identification-A cross-linguistic study

This study aims to understand the effects of face mask on speech production between Mandarin Chinese and English, and on the automatic classification of mask/no mask speech and individual speakers. A cross-linguistic study on mask speech between Mandarin Chinese and English was then conducted. Conti...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 18; no. 3; p. e0283724
Main Authors	Geng, Puyang, Lu, Qimeng, Guo, Hong, Zeng, Jinhua
Format	Journal Article
Language	English
Published	United States Public Library of Science 30.03.2023 Public Library of Science (PLoS)
Subjects	Accuracy Acoustic analysis Acoustic phonetics Acoustic properties Acoustics Algorithms Analysis Bayes Theorem Biology and Life Sciences Chinese languages Classification Classifiers Comparative linguistics Computer and Information Sciences Continuous speech Coronaviruses COVID-19 Discriminant analysis Disease transmission English language Evaluation Face Female Forensic linguistics Forensic science Fundamental frequency Humans Identification Intelligibility Jitter Language Linguistics Machine learning Male Mandarin Masks Medical research Medicine and Health Sciences Phonetics Physical Sciences Protective equipment Research and Analysis Methods Shimmer Social Sciences Sound intensity Speaker identification Speaking Speech Speech Acoustics Speech Intelligibility Speech Perception Speech production Speech recognition Supervised learning Support vector machines Vibration Voice recognition China United Kingdom
Online Access	Get full text

Cover

Loading…

More Information
Summary:	This study aims to understand the effects of face mask on speech production between Mandarin Chinese and English, and on the automatic classification of mask/no mask speech and individual speakers. A cross-linguistic study on mask speech between Mandarin Chinese and English was then conducted. Continuous speech of the phonetically balanced texts in both Chinese and English versions were recorded from thirty native speakers of Mandarin Chinese (i.e., 15 males and 15 females) with and without wearing a surgical mask. The results of acoustic analyses showed that mask speech exhibited higher F0, intensity, HNR, and lower jitter and shimmer than no mask speech for Mandarin Chinese, whereas higher HNR and lower jitter and shimmer were observed for English mask speech. The results of classification analyses showed that, based on the four supervised learning algorithms (i.e., Linear Discriminant Analysis, Naïve Bayes Classifier, Random Forest, and Support Vector Machine), undesirable performances (i.e., lower than 50%) in classifying the speech with and without a face mask, and highly-variable accuracies (i.e., ranging from 40% to 89.2%) in identifying individual speakers were achieved. These findings imply that the speakers tend to conduct acoustic adjustments to improve their speech intelligibility when wearing surgical mask. However, a cross-linguistic difference in speech strategies to compensate for intelligibility was observed that Mandarin speech was produced with higher F0, intensity, and HNR, while English was produced with higher HNR. Besides, the highly-variable accuracies of speaker identification might suggest that surgical mask would impact the general performance of the accuracy of automatic speaker recognition. In general, therefore, it seems wearing a surgical mask would impact both acoustic-phonetic and automatic speaker recognition approaches to some extent, thus suggesting particular cautions in the real-case practice of forensic speaker identification.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 23 Competing Interests: The authors have declared that no competing interests exist.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0283724