Skeleton Aware Multi-modal Sign Language Recognition
Sign language is commonly used by deaf or speech impaired people to communicate but requires significant effort to master. Sign Language Recognition (SLR) aims to bridge the gap between sign language users and others by recognizing signs from given videos. It is an essential yet challenging task sin...
Saved in:
Published in | IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops pp. 3408 - 3418 |
---|---|
Main Authors | , , , , , |
Format | Conference Proceeding |
Language | English |
Published |
IEEE
01.06.2021
|
Subjects | |
Online Access | Get full text |
ISSN | 2160-7516 |
DOI | 10.1109/CVPRW53098.2021.00380 |
Cover
Loading…
Abstract | Sign language is commonly used by deaf or speech impaired people to communicate but requires significant effort to master. Sign Language Recognition (SLR) aims to bridge the gap between sign language users and others by recognizing signs from given videos. It is an essential yet challenging task since sign language is performed with the fast and complex movement of hand gestures, body posture, and even facial expressions. Recently, skeleton-based action recognition attracts increasing attention due to the independence between the subject and background variation. However, skeleton-based SLR is still under exploration due to the lack of annotations on hand keypoints. Some efforts have been made to use hand detectors with pose estimators to extract hand key points and learn to recognize sign language via Neural Networks, but none of them outperforms RGB-based methods. To this end, we propose a novel Skeleton Aware Multi-modal SLR framework (SAM-SLR) to take advantage of multi-modal information towards a higher recognition rate. Specifically, we propose a Sign Language Graph Convolution Network (SL-GCN) to model the embedded dynamics and a novel Separable Spatial-Temporal Convolution Network (SSTCN) to exploit skeleton features. RGB and depth modalities are also incorporated and assembled into our framework to provide global information that is complementary to the skeleton-based methods SL-GCN and SSTCN. As a result, SAM-SLR achieves the highest performance in both RGB (98.42%) and RGB-D (98.53%) tracks in 2021 Looking at People Large Scale Signer Independent Isolated SLR Challenge. Our code is available at https://github.com/jackyjsy/CVPR21Chal-SLR |
---|---|
AbstractList | Sign language is commonly used by deaf or speech impaired people to communicate but requires significant effort to master. Sign Language Recognition (SLR) aims to bridge the gap between sign language users and others by recognizing signs from given videos. It is an essential yet challenging task since sign language is performed with the fast and complex movement of hand gestures, body posture, and even facial expressions. Recently, skeleton-based action recognition attracts increasing attention due to the independence between the subject and background variation. However, skeleton-based SLR is still under exploration due to the lack of annotations on hand keypoints. Some efforts have been made to use hand detectors with pose estimators to extract hand key points and learn to recognize sign language via Neural Networks, but none of them outperforms RGB-based methods. To this end, we propose a novel Skeleton Aware Multi-modal SLR framework (SAM-SLR) to take advantage of multi-modal information towards a higher recognition rate. Specifically, we propose a Sign Language Graph Convolution Network (SL-GCN) to model the embedded dynamics and a novel Separable Spatial-Temporal Convolution Network (SSTCN) to exploit skeleton features. RGB and depth modalities are also incorporated and assembled into our framework to provide global information that is complementary to the skeleton-based methods SL-GCN and SSTCN. As a result, SAM-SLR achieves the highest performance in both RGB (98.42%) and RGB-D (98.53%) tracks in 2021 Looking at People Large Scale Signer Independent Isolated SLR Challenge. Our code is available at https://github.com/jackyjsy/CVPR21Chal-SLR |
Author | Fu, Yun Bai, Yue Jiang, Songyao Li, Kunpeng Sun, Bin Wang, Lichen |
Author_xml | – sequence: 1 givenname: Songyao surname: Jiang fullname: Jiang, Songyao organization: Northeastern University,Boston,MA,USA – sequence: 2 givenname: Bin surname: Sun fullname: Sun, Bin organization: Northeastern University,Boston,MA,USA – sequence: 3 givenname: Lichen surname: Wang fullname: Wang, Lichen organization: Northeastern University,Boston,MA,USA – sequence: 4 givenname: Yue surname: Bai fullname: Bai, Yue organization: Northeastern University,Boston,MA,USA – sequence: 5 givenname: Kunpeng surname: Li fullname: Li, Kunpeng organization: Northeastern University,Boston,MA,USA – sequence: 6 givenname: Yun surname: Fu fullname: Fu, Yun organization: Northeastern University,Boston,MA,USA |
BookMark | eNotzMtKw0AUANBRFGxrv0CE_EDqvXcemVmWolWIKK2PZZlJbsJoOpE0Rfx7F7o6uzMVZ6lPLMQ1wgIR3M3q7XnzriU4uyAgXABICydiisZopaxzxamYEBrIC43mQswPhw8AQLBaOzkRavvJHY99ypbffuDs8diNMd_3te-ybWxTVvrUHn3L2Yarvk1xjH26FOeN7w48_3cmXu9uX1b3efm0flgtyzwSyDF3oNlUPqAFyxUxgEW2ZGpogqywbjQZUgGZgpGBCuVrKRG8Ah2CUrWciau_NzLz7muIez_87JwmiYrkL3NVRzw |
CODEN | IEEPAD |
ContentType | Conference Proceeding |
DBID | 6IE 6IL CBEJK RIE RIL |
DOI | 10.1109/CVPRW53098.2021.00380 |
DatabaseName | IEEE Electronic Library (IEL) Conference Proceedings IEEE Xplore POP ALL IEEE Xplore All Conference Proceedings IEEE Electronic Library (IEL) IEEE Proceedings Order Plans (POP All) 1998-Present |
DatabaseTitleList | |
Database_xml | – sequence: 1 dbid: RIE name: IEEE Xplore url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Applied Sciences |
EISBN | 1665448997 9781665448994 |
EISSN | 2160-7516 |
EndPage | 3418 |
ExternalDocumentID | 9523142 |
Genre | orig-research |
GrantInformation_xml | – fundername: Army Research Office funderid: 10.13039/100000183 |
GroupedDBID | 6IE 6IF 6IL 6IN AAJGR AAWTH ABLEC ACGFS ADZIZ ALMA_UNASSIGNED_HOLDINGS BEFXN BFFAM BGNUA BKEBE BPEOZ CBEJK CHZPO IEGSK M43 OCL RIE RIL |
ID | FETCH-LOGICAL-i203t-905e6cab1808ec2e0081e826d0fb3c1df52624b1e2b63b274ad3310a405bb44d3 |
IEDL.DBID | RIE |
IngestDate | Wed Aug 27 02:23:09 EDT 2025 |
IsPeerReviewed | false |
IsScholarly | false |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-i203t-905e6cab1808ec2e0081e826d0fb3c1df52624b1e2b63b274ad3310a405bb44d3 |
PageCount | 11 |
ParticipantIDs | ieee_primary_9523142 |
PublicationCentury | 2000 |
PublicationDate | 2021-June |
PublicationDateYYYYMMDD | 2021-06-01 |
PublicationDate_xml | – month: 06 year: 2021 text: 2021-June |
PublicationDecade | 2020 |
PublicationTitle | IEEE Computer Society Conference on Computer Vision and Pattern Recognition workshops |
PublicationTitleAbbrev | CVPRW |
PublicationYear | 2021 |
Publisher | IEEE |
Publisher_xml | – name: IEEE |
SSID | ssj0001085593 |
Score | 2.3034625 |
Snippet | Sign language is commonly used by deaf or speech impaired people to communicate but requires significant effort to master. Sign Language Recognition (SLR) aims... |
SourceID | ieee |
SourceType | Publisher |
StartPage | 3408 |
SubjectTerms | Annotations Assistive technology Conferences Convolution Detectors Gesture recognition Neural networks |
Title | Skeleton Aware Multi-modal Sign Language Recognition |
URI | https://ieeexplore.ieee.org/document/9523142 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG6QkydUMP7ODh7t6K9129EQCTFqCIhyI233Zgi6GRwx8a-33QZE48Hb0su2vrTf916_7xWhSxnbPZ9zg0MpAyy4SrEmCjBNmQmZYmEMpcr3QQ4m4nYaTBvoauOFAYBSfAa-eyzP8pPcrFyprBvbrIkKu-Hu2MSt8mpt6ylOcBXz2qRDSdztPQ1HzwEnsVNwMeq7QzDy4xKVEkP6LXS_fnslHVn4q0L75utXY8b_ft4e6mzdet5wg0P7qAHZAWrV9NKrF-9HG4nxwmKM5Xre9adagld6b_FbnqhXbzx_yby7unbpjdaqojzroEn_5rE3wPWlCXjOCC9wTAKQRmkakQgMA4f5YHOIhKSaG5qkAZNMaApMS65tTqoSbimessRNayESfoiaWZ7BEfJYJEUYmyihrgkZETZyqWVLkgMXTAs4Rm03CbP3qi_GrP7_k7-HT9GuC0MlszpDzWK5gnML6IW-KCP5DbASncg |
linkProvider | IEEE |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV3PT8IwFG4IHvSECsbf7uDRja7tuvVoiAQVCOGHciNr92YIuhkcMfGvt90GROPB29LLtr603_dev-8VoWsu9J5PqbJ9zj2b0TC2JQ7BdmOifBISX0Cu8u3zzoQ9TL1pBd1svDAAkIvPwDGP-Vl-lKqVKZU1hc6aXKY33B2N-0wUbq1tRcVIrgQtbTouFs3W02D47FEsjIaLuI45BsM_rlHJUaRdQ731-wvxyMJZZdJRX79aM_73A_dRY-vXswYbJDpAFUgOUa0kmFa5fD_qiI0WGmU027NuP8MlWLn71n5Lo_DVGs1fEqtbVi-t4VpXlCYNNGnfjVsdu7w2wZ4TTDNbYA-4CqUb4AAUAYP6oLOICMeSKjeKPcIJky4QyanUWWkYUU3yQk3dpGQsokeomqQJHCOLBJz5QgWRa9qQYaZjF2u-xClQRiSDE1Q3kzB7LzpjzMr_P_17-Artdsa97qx73388Q3smJIXo6hxVs-UKLjS8Z_Iyj-o3LaOhGA |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=proceeding&rft.title=IEEE+Computer+Society+Conference+on+Computer+Vision+and+Pattern+Recognition+workshops&rft.atitle=Skeleton+Aware+Multi-modal+Sign+Language+Recognition&rft.au=Jiang%2C+Songyao&rft.au=Sun%2C+Bin&rft.au=Wang%2C+Lichen&rft.au=Bai%2C+Yue&rft.date=2021-06-01&rft.pub=IEEE&rft.eissn=2160-7516&rft.spage=3408&rft.epage=3418&rft_id=info:doi/10.1109%2FCVPRW53098.2021.00380&rft.externalDocID=9523142 |