Skeleton Aware Multi-modal Sign Language Recognition

Sign language is commonly used by deaf or speech impaired people to communicate but requires significant effort to master. Sign Language Recognition (SLR) aims to bridge the gap between sign language users and others by recognizing signs from given videos. It is an essential yet challenging task sin...

Full description

Saved in:

Bibliographic Details
Published in	IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops pp. 3408 - 3418
Main Authors	Jiang, Songyao, Sun, Bin, Wang, Lichen, Bai, Yue, Li, Kunpeng, Fu, Yun
Format	Conference Proceeding
Language	English
Published	IEEE 01.06.2021
Subjects	Annotations Assistive technology Conferences Convolution Detectors Gesture recognition Neural networks
Online Access	Get full text
ISSN	2160-7516
DOI	10.1109/CVPRW53098.2021.00380

Cover

Loading…

Abstract	Sign language is commonly used by deaf or speech impaired people to communicate but requires significant effort to master. Sign Language Recognition (SLR) aims to bridge the gap between sign language users and others by recognizing signs from given videos. It is an essential yet challenging task since sign language is performed with the fast and complex movement of hand gestures, body posture, and even facial expressions. Recently, skeleton-based action recognition attracts increasing attention due to the independence between the subject and background variation. However, skeleton-based SLR is still under exploration due to the lack of annotations on hand keypoints. Some efforts have been made to use hand detectors with pose estimators to extract hand key points and learn to recognize sign language via Neural Networks, but none of them outperforms RGB-based methods. To this end, we propose a novel Skeleton Aware Multi-modal SLR framework (SAM-SLR) to take advantage of multi-modal information towards a higher recognition rate. Specifically, we propose a Sign Language Graph Convolution Network (SL-GCN) to model the embedded dynamics and a novel Separable Spatial-Temporal Convolution Network (SSTCN) to exploit skeleton features. RGB and depth modalities are also incorporated and assembled into our framework to provide global information that is complementary to the skeleton-based methods SL-GCN and SSTCN. As a result, SAM-SLR achieves the highest performance in both RGB (98.42%) and RGB-D (98.53%) tracks in 2021 Looking at People Large Scale Signer Independent Isolated SLR Challenge. Our code is available at https://github.com/jackyjsy/CVPR21Chal-SLR
AbstractList	Sign language is commonly used by deaf or speech impaired people to communicate but requires significant effort to master. Sign Language Recognition (SLR) aims to bridge the gap between sign language users and others by recognizing signs from given videos. It is an essential yet challenging task since sign language is performed with the fast and complex movement of hand gestures, body posture, and even facial expressions. Recently, skeleton-based action recognition attracts increasing attention due to the independence between the subject and background variation. However, skeleton-based SLR is still under exploration due to the lack of annotations on hand keypoints. Some efforts have been made to use hand detectors with pose estimators to extract hand key points and learn to recognize sign language via Neural Networks, but none of them outperforms RGB-based methods. To this end, we propose a novel Skeleton Aware Multi-modal SLR framework (SAM-SLR) to take advantage of multi-modal information towards a higher recognition rate. Specifically, we propose a Sign Language Graph Convolution Network (SL-GCN) to model the embedded dynamics and a novel Separable Spatial-Temporal Convolution Network (SSTCN) to exploit skeleton features. RGB and depth modalities are also incorporated and assembled into our framework to provide global information that is complementary to the skeleton-based methods SL-GCN and SSTCN. As a result, SAM-SLR achieves the highest performance in both RGB (98.42%) and RGB-D (98.53%) tracks in 2021 Looking at People Large Scale Signer Independent Isolated SLR Challenge. Our code is available at https://github.com/jackyjsy/CVPR21Chal-SLR
Author	Fu, Yun Bai, Yue Jiang, Songyao Li, Kunpeng Sun, Bin Wang, Lichen
Author_xml	– sequence: 1 givenname: Songyao surname: Jiang fullname: Jiang, Songyao organization: Northeastern University,Boston,MA,USA – sequence: 2 givenname: Bin surname: Sun fullname: Sun, Bin organization: Northeastern University,Boston,MA,USA – sequence: 3 givenname: Lichen surname: Wang fullname: Wang, Lichen organization: Northeastern University,Boston,MA,USA – sequence: 4 givenname: Yue surname: Bai fullname: Bai, Yue organization: Northeastern University,Boston,MA,USA – sequence: 5 givenname: Kunpeng surname: Li fullname: Li, Kunpeng organization: Northeastern University,Boston,MA,USA – sequence: 6 givenname: Yun surname: Fu fullname: Fu, Yun organization: Northeastern University,Boston,MA,USA
BookMark	eNotzMtKw0AUANBRFGxrv0CE_EDqvXcemVmWolWIKK2PZZlJbsJoOpE0Rfx7F7o6uzMVZ6lPLMQ1wgIR3M3q7XnzriU4uyAgXABICydiisZopaxzxamYEBrIC43mQswPhw8AQLBaOzkRavvJHY99ypbffuDs8diNMd_3te-ybWxTVvrUHn3L2Yarvk1xjH26FOeN7w48_3cmXu9uX1b3efm0flgtyzwSyDF3oNlUPqAFyxUxgEW2ZGpogqywbjQZUgGZgpGBCuVrKRG8Ah2CUrWciau_NzLz7muIez_87JwmiYrkL3NVRzw
CODEN	IEEPAD
ContentType	Conference Proceeding
DBID	6IE 6IL CBEJK RIE RIL
DOI	10.1109/CVPRW53098.2021.00380
DatabaseName	IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml	– sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Applied Sciences
EISBN	1665448997 9781665448994
EISSN	2160-7516
EndPage	3418
ExternalDocumentID	9523142
Genre	orig-research
GrantInformation_xml	– fundername: Army Research Office funderid: 10.13039/100000183
GroupedDBID	6IE 6IF 6IL 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK M43 OCL RIE RIL
ID	FETCH-LOGICAL-i203t-905e6cab1808ec2e0081e826d0fb3c1df52624b1e2b63b274ad3310a405bb44d3
IEDL.DBID	RIE
IngestDate	Wed Aug 27 02:23:09 EDT 2025
IsPeerReviewed	false
IsScholarly	false
Language	English
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-i203t-905e6cab1808ec2e0081e826d0fb3c1df52624b1e2b63b274ad3310a405bb44d3
PageCount	11
ParticipantIDs	ieee_primary_9523142
PublicationCentury	2000
PublicationDate	2021-June
PublicationDateYYYYMMDD	2021-06-01
PublicationDate_xml	– month: 06 year: 2021 text: 2021-June
PublicationDecade	2020
PublicationTitle	IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops
PublicationTitleAbbrev	CVPRW
PublicationYear	2021
Publisher	IEEE
Publisher_xml	– name: IEEE
SSID	ssj0001085593
Score	2.3034625
Snippet	Sign language is commonly used by deaf or speech impaired people to communicate but requires significant effort to master. Sign Language Recognition (SLR) aims...
SourceID	ieee
SourceType	Publisher
StartPage	3408
SubjectTerms	Annotations Assistive technology Conferences Convolution Detectors Gesture recognition Neural networks
Title	Skeleton Aware Multi-modal Sign Language Recognition
URI	https://ieeexplore.ieee.org/document/9523142
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG6QkydUMP7ODh7t6K9129EQCTFqCIhyI233Zgi6GRwx8a-33QZE48Hb0su2vrTf916_7xWhSxnbPZ9zg0MpAyy4SrEmCjBNmQmZYmEMpcr3QQ4m4nYaTBvoauOFAYBSfAa-eyzP8pPcrFyprBvbrIkKu-Hu2MSt8mpt6ylOcBXz2qRDSdztPQ1HzwEnsVNwMeq7QzDy4xKVEkP6LXS_fnslHVn4q0L75utXY8b_ft4e6mzdet5wg0P7qAHZAWrV9NKrF-9HG4nxwmKM5Xre9adagld6b_FbnqhXbzx_yby7unbpjdaqojzroEn_5rE3wPWlCXjOCC9wTAKQRmkakQgMA4f5YHOIhKSaG5qkAZNMaApMS65tTqoSbimessRNayESfoiaWZ7BEfJYJEUYmyihrgkZETZyqWVLkgMXTAs4Rm03CbP3qi_GrP7_k7-HT9GuC0MlszpDzWK5gnML6IW-KCP5DbASncg
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4IHvSECsbf7uDRja7tuvVoiAQVCOGHciNr92YIuhkcMfGvt90GROPB29LLtr603_dev-8VoWsu9J5PqbJ9zj2b0TC2JQ7BdmOifBISX0Cu8u3zzoQ9TL1pBd1svDAAkIvPwDGP-Vl-lKqVKZU1hc6aXKY33B2N-0wUbq1tRcVIrgQtbTouFs3W02D47FEsjIaLuI45BsM_rlHJUaRdQ731-wvxyMJZZdJRX79aM_73A_dRY-vXswYbJDpAFUgOUa0kmFa5fD_qiI0WGmU027NuP8MlWLn71n5Lo_DVGs1fEqtbVi-t4VpXlCYNNGnfjVsdu7w2wZ4TTDNbYA-4CqUb4AAUAYP6oLOICMeSKjeKPcIJky4QyanUWWkYUU3yQk3dpGQsokeomqQJHCOLBJz5QgWRa9qQYaZjF2u-xClQRiSDE1Q3kzB7LzpjzMr_P_17-Artdsa97qx73388Q3smJIXo6hxVs-UKLjS8Z_Iyj-o3LaOhGA
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition+workshops&rft.atitle=Skeleton+Aware+Multi-modal+Sign+Language+Recognition&rft.au=Jiang%2C+Songyao&rft.au=Sun%2C+Bin&rft.au=Wang%2C+Lichen&rft.au=Bai%2C+Yue&rft.date=2021-06-01&rft.pub=IEEE&rft.eissn=2160-7516&rft.spage=3408&rft.epage=3418&rft_id=info:doi/10.1109%2FCVPRW53098.2021.00380&rft.externalDocID=9523142