Lip reading with Hahn Convolutional Neural Networks
Lipreading or Visual speech recognition is the process of decoding speech from speaker's mouth movements. It is used for people with hearing impairment, to understand patients attained with laryngeal cancer, people with vocal cord paralysis and in noisy environment. In this paper we aim to deve...
Saved in:
Published in | Image and vision computing Vol. 88; pp. 76 - 83 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier B.V
01.08.2019
|
Subjects | |
Online Access | Get full text |
Cover
Loading…
Summary: | Lipreading or Visual speech recognition is the process of decoding speech from speaker's mouth movements. It is used for people with hearing impairment, to understand patients attained with laryngeal cancer, people with vocal cord paralysis and in noisy environment. In this paper we aim to develop a visual-only speech recognition system based only on video. Our main targeted application is in the medical field for the assistance to laryngectomized persons. To that end, we propose Hahn Convolutional Neural Network (HCNN), a novel architecture based on Hahn moments as first layer in the Convolutional Neural Network (CNN) architecture. We show that HCNN helps in reducing the dimensionality of video images, in gaining training time. HCNN model is trained to classify letters, digits or words given as video images. We evaluated the proposed method on three datasets, AVLetters, OuluVS2 and BBC LRW, and we show that it achieves significant results in comparison with other works in the literature.
•This work proposes a new architecture called Hahn Convolutional Neural Network.•The complexity is reduced enormously by minimizing number of parameters and layers.•The experiments are conducted on AVLetters, OuluVS2 and BBC LRW datasets.•The classification results are 59,23% on AVLetters, 93,72% on OuluVS2, 46,6% on BBC LRW. |
---|---|
ISSN: | 0262-8856 1872-8138 |
DOI: | 10.1016/j.imavis.2019.04.010 |