Simulation of talking faces in the human brain improves auditory speech recognition

Human face-to-face communication is essentially audiovisual. Typically, people talk to us face-to-face, providing concurrent auditory and visual input. Understanding someone is easier when there is visual input, because visual cues like mouth and tongue movements provide complementary information ab...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the National Academy of Sciences - PNAS Vol. 105; no. 18; pp. 6747 - 6752
Main Authors	von Kriegstein, Katharina, Dogan, Özgür, Grüter, Martina, Giraud, Anne-Lise, Kell, Christian A, Grüter, Thomas, Kleinschmidt, Andreas, Kiebel, Stefan J
Format	Journal Article
Language	English
Published	United States National Academy of Sciences 06.05.2008 National Acad Sciences
Series	From the Cover
Subjects	Adult Auditory perception Auditory Perception - physiology Aural learning Behavior Biological Sciences Brain Brain - physiology Cognition & reasoning Cognitive science Cognitive Sciences Communication Computer Simulation Correlations Face Female Humans Identity Life Sciences Magnetic Resonance Imaging Male Middle Aged Neurons and Cognition Neuroscience Neurosciences people prediction Prosopagnosia Simulation speech Speech Perception - physiology Speech recognition tongue Vehicles Visual perception Voice recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Human face-to-face communication is essentially audiovisual. Typically, people talk to us face-to-face, providing concurrent auditory and visual input. Understanding someone is easier when there is visual input, because visual cues like mouth and tongue movements provide complementary information about speech content. Here, we hypothesized that, even in the absence of visual input, the brain optimizes both auditory-only speech and speaker recognition by harvesting speaker-specific predictions and constraints from distinct visual face-processing areas. To test this hypothesis, we performed behavioral and neuroimaging experiments in two groups: subjects with a face recognition deficit (prosopagnosia) and matched controls. The results show that observing a specific person talking for 2 min improves subsequent auditory-only speech and speaker recognition for this person. In both prosopagnosics and controls, behavioral improvement in auditory-only speech recognition was based on an area typically involved in face-movement processing. Improvement in speaker recognition was only present in controls and was based on an area involved in face-identity processing. These findings challenge current unisensory models of speech processing, because they show that, in auditory-only speech, the brain exploits previously encoded audiovisual correlations to optimize communication. We suggest that this optimization is based on speaker-specific audiovisual internal models, which are used to simulate a talking face.
Bibliography:	SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 14 ObjectType-Article-2 content type line 23 ObjectType-Article-1 ObjectType-Feature-2 PMCID: PMC2365564 Author contributions: K.v.K. designed research; K.v.K., Ö.D., M.G., A.-L.G., C.A.K., T.G., and A.K. performed research; S.J.K. contributed new reagents/analytic tools; K.v.K. and Ö.D. analyzed data; and K.v.K. and S.J.K. wrote the paper. Edited by Dale Purves, Duke University Medical Center, Durham, NC, and approved March 15, 2008
ISSN:	0027-8424 1091-6490 1091-6490
DOI:	10.1073/pnas.0710826105