Human Machine Interaction via Visual Speech Spotting

In this paper, we propose an automatic visual speech spotting system adapted for RGB-D cameras and based on Hidden Markov Models (HMMs). Our system is based on two main processing blocks, namely, visual feature extraction and speech spotting and recognition. In feature extraction step, the speaker’s...

Full description

Saved in:

Bibliographic Details
Published in	Advanced Concepts for Intelligent Vision Systems pp. 566 - 574
Main Authors	Rekik, Ahmed, Ben-Hamadou, Achraf, Mahdi, Walid
Format	Book Chapter
Language	English
Published	Cham Springer International Publishing 2015
Series	Lecture Notes in Computer Science
Subjects	Human machine interaction Kinect RGB-camera Visual speech features Visual speech spotting
Online Access	Get full text

Cover

Loading…

More Information
Summary:	In this paper, we propose an automatic visual speech spotting system adapted for RGB-D cameras and based on Hidden Markov Models (HMMs). Our system is based on two main processing blocks, namely, visual feature extraction and speech spotting and recognition. In feature extraction step, the speaker’s face pose is estimated using a 3D face model including a rectangular 3D mouth patch used to precisely extract the mouth region. Then, spatio-temporal features are computed on the extracted mouth region. In the second step, the speech video is segmented by finding the starting and the ending points of meaningful utterances and recognized using Viterbi algorithm. The proposed system is mainly evaluated on an extended version of the MIRACL-VC1 dataset. Experimental results demonstrate that the proposed system can segment and recognize key utterances with a recognition rates of 83 % and a reliability of 81.4 %.
ISBN:	9783319259024 3319259024
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-319-25903-1_49