Complementary models for audio-visual speech classification

A novel scheme for disambiguating conflicting classification results in Audio-Visual Speech Recognition applications is proposed in this paper. The classification scheme can be implemented with both generative and discriminative models and can be used with different input modalities, viz. only audio...

Full description

Saved in:

Bibliographic Details
Published in	International journal of speech technology Vol. 25; no. 1; pp. 231 - 249
Main Authors	Sad, Gonzalo D., Terissi, Lucas D., Gómez, Juan C.
Format	Journal Article
Language	English
Published	New York Springer US 01.03.2022 Springer Nature B.V
Subjects	Artificial Intelligence Audio data Classification Classifiers Datasets Engineering Machine learning Markov analysis Markov chains Recognition Signal,Image and Speech Processing Social Sciences Speech recognition Support vector machines Complementary models Speech classification Classifier combination Audio-visual speech
Online Access	Get full text

Cover

Loading…

More Information
Summary:	A novel scheme for disambiguating conflicting classification results in Audio-Visual Speech Recognition applications is proposed in this paper. The classification scheme can be implemented with both generative and discriminative models and can be used with different input modalities, viz. only audio, only visual, and audio visual information. The proposed scheme consists of the cascade connection of a standard classifier, trained with instances of each particular class, followed by a complementary model which is trained with instances of all the remaining classes. The performance of the proposed recognition system is evaluated on three publicly available audio-visual datasets, and using a generative model, namely a Hidden Markov model, and three discriminative techniques, viz. random forests, support vector machines, and adaptive boosting. The experimental results are promising in the sense that for the three datasets, the different models, and the different input modalities, improvements in the recognition rates are achieved in comparison to other methods reported in the literature over the same datasets.
ISSN:	1381-2416 1572-8110
DOI:	10.1007/s10772-021-09944-7