Robust multi-modal person identification with tolerance of facial expression

The research presented in This work describes audio-visual speaker identification experiments carried out on a large data set of 251 subjects. Both the audio and visual modeling is carried out using hidden Markov models. The visual modality uses the speaker's lip information. The audio and visu...

Full description

Saved in:
Bibliographic Details
Published in2004 IEEE International Conference on Systems, Man and Cybernetics Vol. 1; pp. 580 - 585 vol.1
Main Authors Fox, N.A., Reilly, R.B.
Format Conference Proceeding
LanguageEnglish
Published Piscataway NJ IEEE 2004
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The research presented in This work describes audio-visual speaker identification experiments carried out on a large data set of 251 subjects. Both the audio and visual modeling is carried out using hidden Markov models. The visual modality uses the speaker's lip information. The audio and visual modalities are both degraded to emulate a train/test mismatch. The fusion method employed adapts automatically by using classifier score reliability estimates of both modalities to give improved audio-visual accuracies at all tested levels of audio and visual degradation, compared to the individual audio or visual modality accuracies. A maximum visual identification accuracy of 86% was achieved. This result is comparable to the performance of systems using the entire face, and suggests the hypothesis that the system described would be tolerant to varying facial expression, since only the information around the speaker's lips is employed.
ISBN:0780385667
9780780385665
ISSN:1062-922X
DOI:10.1109/ICSMC.2004.1398362