Automatic lipreading using convolutional neural networks and orthogonal moments

Recently, understanding speech from a speaker's mouth using only visual interpretation of the lips movement has become one of the most complex computer vision tasks. In the present paper, we suggest a new approach named Optimized Quaternion Meixner Moments Convolutional Neural Networks (OQMMCNN...

Full description

Saved in:
Bibliographic Details
Published inMathematical Modeling and Computing Vol. 12; no. 1; pp. 90 - 100
Main Authors Ait Khayi, Y., El Ogri, O., El-Mekkaoui, J., Benslimane, M., Hjouji, A.
Format Journal Article
LanguageEnglish
Published 2025
Online AccessGet full text

Cover

Loading…
More Information
Summary:Recently, understanding speech from a speaker's mouth using only visual interpretation of the lips movement has become one of the most complex computer vision tasks. In the present paper, we suggest a new approach named Optimized Quaternion Meixner Moments Convolutional Neural Networks (OQMMCNN) in order to develop a lipreading system based only on video images. This approach is based on Quaternion Meixner Moments (QMMs) that we use as a filter in the Convolutional Neural Networks (CNN) architecture. In addition, we use the Grey Wolf optimization algorithm (GWO) with the aim of ensuring high accuracy of classification through the optimization of the Quaternion Meixner Moments (QMMs) filter local parameters. We show that this method is an effective solution to decrease the high dimensionality of the video images and the training time. This approach is tested on a public dataset and compared to different methods that use complex models and deep architecture in the literature.
ISSN:2312-9794
2415-3788
DOI:10.23939/mmc2025.01.090