Attention-Based 3D-CNNs for Large-Vocabulary Sign Language Recognition

Sign language recognition (SLR) is an important and challenging research topic in the multimedia field. Conventional techniques for SLR rely on hand-crafted features, which achieve limited success. In this paper, we present attention-based 3D-convolutional neural networks (3D-CNNs) for SLR. The fram...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on circuits and systems for video technology Vol. 29; no. 9; pp. 2822 - 2832
Main Authors	Huang, Jie, Zhou, Wengang, Li, Houqiang, Li, Weiping
Format	Journal Article
Language	English
Published	New York IEEE 01.09.2019 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	3D convolutional neural networks Algorithms Artificial neural networks Assistive technology attention mechanism Convolution Datasets deep learning Feature extraction Gesture recognition Hidden Markov models Multimedia Neural networks Recognition Sign language Sign language recognition Three-dimensional displays
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Sign language recognition (SLR) is an important and challenging research topic in the multimedia field. Conventional techniques for SLR rely on hand-crafted features, which achieve limited success. In this paper, we present attention-based 3D-convolutional neural networks (3D-CNNs) for SLR. The framework has two advantages: 3D-CNNs learn spatio-temporal features from raw video without prior knowledge and the attention mechanism helps to select the clue. When training 3D-CNN for capturing spatio-temporal features, spatial attention is incorporated into the network to focus on the areas of interest. After feature extraction, temporal attention is utilized to select the significant motions for classification. The proposed method is evaluated on two large scale sign language data sets. The first one, collected by ourselves, is a Chinese sign language data set that consists of 500 categories. The other is the ChaLearn14 benchmark. The experiment results demonstrate the effectiveness of our approach compared with state-of-the-art algorithms.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2018.2870740