Multiview meta-metric learning for sign language recognition using triplet loss embeddings

Multiview video processing for recognition is a hard problem if the subject is in continuous motion. Especially the problem becomes even tougher when the subject in question is a human being and the actions to be recognized from the video data are a complex set of actions called sign language. Altho...

Full description

Saved in:

Bibliographic Details
Published in	Pattern analysis and applications : PAA Vol. 26; no. 3; pp. 1125 - 1141
Main Authors	Mopidevi, Suneetha, Prasad, M. V. D., Kishore, Polurie Venkata Vijay
Format	Journal Article
Language	English
Published	London Springer London 01.08.2023 Springer Nature B.V
Subjects	Computer Science Datasets Deep learning Human activity recognition Image processing Machine learning Pattern Recognition Recognition Sign language Theoretical Advances Video data Multiview sign language and action recognition Triplet loss Deep meta-metric learning
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Multiview video processing for recognition is a hard problem if the subject is in continuous motion. Especially the problem becomes even tougher when the subject in question is a human being and the actions to be recognized from the video data are a complex set of actions called sign language. Although many deep learning models have been successfully applied for sign language recognition (SLR), very few models have considered multiple views in their training set. In this work, we propose to apply meta-metric learning for video-based SLR. Contrasting to traditional metric learning where the triplet loss is constructed on the sample-based distances, the meta-metric learns on the set-based distances. Consequently, we construct meta-cells on the entire multiview dataset and perform a task-based learning approach with respect to support cells and query sets. Additionally, we propose a maximum view pooled distance on sub-tasks for binding intra class views. Experiments conducted on the multiview sign language dataset and four human action recognition datasets show that the proposed multiview meta-metric learning model (MVDMML) achieves higher accuracies than the baselines.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1433-7541 1433-755X
DOI:	10.1007/s10044-023-01134-2