Combinational sign language recognition

Traditional Sign Language Recognition (SLR) suffers from the scale limitation of SL datasets, which may lead to over-fitting in narrow context and application. In this paper, to solve the problem, we for the first time propose a Combinational Sign Language Recognition (CombSLR) framework, which can...

Full description

Saved in:

Bibliographic Details
Published in	Computer vision and image understanding Vol. 241; p. 103972
Main Authors	Gao, Liqing, Feng, Wei, Lyu, Fan, Wan, Liang
Format	Journal Article
Language	English
Published	Elsevier Inc 01.04.2024
Subjects	Combinational learning Context passing Feature insertion Location prediction Sign language recognition (SLR) 41A10 65D05 Location prediction Feature insertion 65D17 Sign language recognition (SLR) Context passing 41A05 Combinational learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Traditional Sign Language Recognition (SLR) suffers from the scale limitation of SL datasets, which may lead to over-fitting in narrow context and application. In this paper, to solve the problem, we for the first time propose a Combinational Sign Language Recognition (CombSLR) framework, which can serve as an augmentation to extend existing datasets by combining continuous videos (called Template) and isolated videos (called Entity). The CombSLR framework is trained on combinational SL data (T & E) and applied on continuous SL data. However, due to the unknown combination location and context inconsistency between any T-E pair, naively inserting E into T is infeasible. To tackle this issue, we propose a simple yet effective method named EinT, which contains two main modules: (1) Location Candidate Prediction, to produce a reliable insertion location considering the inter-frame relationship and make the network end-to-end trainable; (2) Feature Insertion via Context Passing, to eliminate context inconsistency between T and E feature. EinT can be easily compatible with the existing SLR models to effectively implement data augmentation at the feature level during training stage. We conduct extensive experiments on multiple publicly available sign language datasets, e.g., CCLS, CSL+DEVISIGN-D and CSL-Daily+DEVISIGN-D. The experimental results show the CombSLR can significantly promote existing SLR methods, e.g., averagely improving by 15.1% on CCLS dataset and 6.4% on CSL dataset for WER metric, which demonstrates the superiority of CombSLR framework. •We propose a novel and general Combinational Sign Language Recognition (CombSLR) framework, which for the first time serves as a data augmentation method to solve the problem of the limited scale of SL data.•In CombSLR framework, we propose an EinT method, which can reliably insert E into T at the feature level to achieve the effective combination of T and E.•The extensive experiments on three public datasets demonstrate the effectiveness of our proposed EinT method, which can be embedded into any SLR model to improve its performance.
ISSN:	1077-3142 1090-235X
DOI:	10.1016/j.cviu.2024.103972