Skeleton Aware Multi-modal Sign Language Recognition

Sign language is commonly used by deaf or speech impaired people to communicate but requires significant effort to master. Sign Language Recognition (SLR) aims to bridge the gap between sign language users and others by recognizing signs from given videos. It is an essential yet challenging task sin...

Full description

Saved in:
Bibliographic Details
Published inIEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops pp. 3408 - 3418
Main Authors Jiang, Songyao, Sun, Bin, Wang, Lichen, Bai, Yue, Li, Kunpeng, Fu, Yun
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.06.2021
Subjects
Online AccessGet full text
ISSN2160-7516
DOI10.1109/CVPRW53098.2021.00380

Cover

Loading…
Abstract Sign language is commonly used by deaf or speech impaired people to communicate but requires significant effort to master. Sign Language Recognition (SLR) aims to bridge the gap between sign language users and others by recognizing signs from given videos. It is an essential yet challenging task since sign language is performed with the fast and complex movement of hand gestures, body posture, and even facial expressions. Recently, skeleton-based action recognition attracts increasing attention due to the independence between the subject and background variation. However, skeleton-based SLR is still under exploration due to the lack of annotations on hand keypoints. Some efforts have been made to use hand detectors with pose estimators to extract hand key points and learn to recognize sign language via Neural Networks, but none of them outperforms RGB-based methods. To this end, we propose a novel Skeleton Aware Multi-modal SLR framework (SAM-SLR) to take advantage of multi-modal information towards a higher recognition rate. Specifically, we propose a Sign Language Graph Convolution Network (SL-GCN) to model the embedded dynamics and a novel Separable Spatial-Temporal Convolution Network (SSTCN) to exploit skeleton features. RGB and depth modalities are also incorporated and assembled into our framework to provide global information that is complementary to the skeleton-based methods SL-GCN and SSTCN. As a result, SAM-SLR achieves the highest performance in both RGB (98.42%) and RGB-D (98.53%) tracks in 2021 Looking at People Large Scale Signer Independent Isolated SLR Challenge. Our code is available at https://github.com/jackyjsy/CVPR21Chal-SLR
AbstractList Sign language is commonly used by deaf or speech impaired people to communicate but requires significant effort to master. Sign Language Recognition (SLR) aims to bridge the gap between sign language users and others by recognizing signs from given videos. It is an essential yet challenging task since sign language is performed with the fast and complex movement of hand gestures, body posture, and even facial expressions. Recently, skeleton-based action recognition attracts increasing attention due to the independence between the subject and background variation. However, skeleton-based SLR is still under exploration due to the lack of annotations on hand keypoints. Some efforts have been made to use hand detectors with pose estimators to extract hand key points and learn to recognize sign language via Neural Networks, but none of them outperforms RGB-based methods. To this end, we propose a novel Skeleton Aware Multi-modal SLR framework (SAM-SLR) to take advantage of multi-modal information towards a higher recognition rate. Specifically, we propose a Sign Language Graph Convolution Network (SL-GCN) to model the embedded dynamics and a novel Separable Spatial-Temporal Convolution Network (SSTCN) to exploit skeleton features. RGB and depth modalities are also incorporated and assembled into our framework to provide global information that is complementary to the skeleton-based methods SL-GCN and SSTCN. As a result, SAM-SLR achieves the highest performance in both RGB (98.42%) and RGB-D (98.53%) tracks in 2021 Looking at People Large Scale Signer Independent Isolated SLR Challenge. Our code is available at https://github.com/jackyjsy/CVPR21Chal-SLR
Author Fu, Yun
Bai, Yue
Jiang, Songyao
Li, Kunpeng
Sun, Bin
Wang, Lichen
Author_xml – sequence: 1
  givenname: Songyao
  surname: Jiang
  fullname: Jiang, Songyao
  organization: Northeastern University,Boston,MA,USA
– sequence: 2
  givenname: Bin
  surname: Sun
  fullname: Sun, Bin
  organization: Northeastern University,Boston,MA,USA
– sequence: 3
  givenname: Lichen
  surname: Wang
  fullname: Wang, Lichen
  organization: Northeastern University,Boston,MA,USA
– sequence: 4
  givenname: Yue
  surname: Bai
  fullname: Bai, Yue
  organization: Northeastern University,Boston,MA,USA
– sequence: 5
  givenname: Kunpeng
  surname: Li
  fullname: Li, Kunpeng
  organization: Northeastern University,Boston,MA,USA
– sequence: 6
  givenname: Yun
  surname: Fu
  fullname: Fu, Yun
  organization: Northeastern University,Boston,MA,USA
BookMark eNotzMtKw0AUANBRFGxrv0CE_EDqvXcemVmWolWIKK2PZZlJbsJoOpE0Rfx7F7o6uzMVZ6lPLMQ1wgIR3M3q7XnzriU4uyAgXABICydiisZopaxzxamYEBrIC43mQswPhw8AQLBaOzkRavvJHY99ypbffuDs8diNMd_3te-ybWxTVvrUHn3L2Yarvk1xjH26FOeN7w48_3cmXu9uX1b3efm0flgtyzwSyDF3oNlUPqAFyxUxgEW2ZGpogqywbjQZUgGZgpGBCuVrKRG8Ah2CUrWciau_NzLz7muIez_87JwmiYrkL3NVRzw
CODEN IEEPAD
ContentType Conference Proceeding
DBID 6IE
6IL
CBEJK
RIE
RIL
DOI 10.1109/CVPRW53098.2021.00380
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Xplore POP ALL
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP All) 1998-Present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Xplore
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Applied Sciences
EISBN 1665448997
9781665448994
EISSN 2160-7516
EndPage 3418
ExternalDocumentID 9523142
Genre orig-research
GrantInformation_xml – fundername: Army Research Office
  funderid: 10.13039/100000183
GroupedDBID 6IE
6IF
6IL
6IN
AAJGR
AAWTH
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
M43
OCL
RIE
RIL
ID FETCH-LOGICAL-i203t-905e6cab1808ec2e0081e826d0fb3c1df52624b1e2b63b274ad3310a405bb44d3
IEDL.DBID RIE
IngestDate Wed Aug 27 02:23:09 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i203t-905e6cab1808ec2e0081e826d0fb3c1df52624b1e2b63b274ad3310a405bb44d3
PageCount 11
ParticipantIDs ieee_primary_9523142
PublicationCentury 2000
PublicationDate 2021-June
PublicationDateYYYYMMDD 2021-06-01
PublicationDate_xml – month: 06
  year: 2021
  text: 2021-June
PublicationDecade 2020
PublicationTitle IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops
PublicationTitleAbbrev CVPRW
PublicationYear 2021
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0001085593
Score 2.3034625
Snippet Sign language is commonly used by deaf or speech impaired people to communicate but requires significant effort to master. Sign Language Recognition (SLR) aims...
SourceID ieee
SourceType Publisher
StartPage 3408
SubjectTerms Annotations
Assistive technology
Conferences
Convolution
Detectors
Gesture recognition
Neural networks
Title Skeleton Aware Multi-modal Sign Language Recognition
URI https://ieeexplore.ieee.org/document/9523142
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG6QkydUMP7ODh7t6K9129EQCTFqCIhyI233Zgi6GRwx8a-33QZE48Hb0su2vrTf916_7xWhSxnbPZ9zg0MpAyy4SrEmCjBNmQmZYmEMpcr3QQ4m4nYaTBvoauOFAYBSfAa-eyzP8pPcrFyprBvbrIkKu-Hu2MSt8mpt6ylOcBXz2qRDSdztPQ1HzwEnsVNwMeq7QzDy4xKVEkP6LXS_fnslHVn4q0L75utXY8b_ft4e6mzdet5wg0P7qAHZAWrV9NKrF-9HG4nxwmKM5Xre9adagld6b_FbnqhXbzx_yby7unbpjdaqojzroEn_5rE3wPWlCXjOCC9wTAKQRmkakQgMA4f5YHOIhKSaG5qkAZNMaApMS65tTqoSbimessRNayESfoiaWZ7BEfJYJEUYmyihrgkZETZyqWVLkgMXTAs4Rm03CbP3qi_GrP7_k7-HT9GuC0MlszpDzWK5gnML6IW-KCP5DbASncg
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4IHvSECsbf7uDRja7tuvVoiAQVCOGHciNr92YIuhkcMfGvt90GROPB29LLtr603_dev-8VoWsu9J5PqbJ9zj2b0TC2JQ7BdmOifBISX0Cu8u3zzoQ9TL1pBd1svDAAkIvPwDGP-Vl-lKqVKZU1hc6aXKY33B2N-0wUbq1tRcVIrgQtbTouFs3W02D47FEsjIaLuI45BsM_rlHJUaRdQ731-wvxyMJZZdJRX79aM_73A_dRY-vXswYbJDpAFUgOUa0kmFa5fD_qiI0WGmU027NuP8MlWLn71n5Lo_DVGs1fEqtbVi-t4VpXlCYNNGnfjVsdu7w2wZ4TTDNbYA-4CqUb4AAUAYP6oLOICMeSKjeKPcIJky4QyanUWWkYUU3yQk3dpGQsokeomqQJHCOLBJz5QgWRa9qQYaZjF2u-xClQRiSDE1Q3kzB7LzpjzMr_P_17-Artdsa97qx73388Q3smJIXo6hxVs-UKLjS8Z_Iyj-o3LaOhGA
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition+workshops&rft.atitle=Skeleton+Aware+Multi-modal+Sign+Language+Recognition&rft.au=Jiang%2C+Songyao&rft.au=Sun%2C+Bin&rft.au=Wang%2C+Lichen&rft.au=Bai%2C+Yue&rft.date=2021-06-01&rft.pub=IEEE&rft.eissn=2160-7516&rft.spage=3408&rft.epage=3418&rft_id=info:doi/10.1109%2FCVPRW53098.2021.00380&rft.externalDocID=9523142