KSL-Guide: A Large-scale Korean Sign Language Dataset Including Interrogative Sentences for Guiding the Deaf and Hard-of-Hearing

Many advancements in computer vision and machine learning have shown potential for significantly improving the lives of people with disabilities. In particular, recent research has demonstrated that deep neural network models could be used to bridge the gap between the deaf who use sign language and...

Full description

Saved in:

Bibliographic Details
Published in	2021 16th IEEE International Conference on Automatic Face and Gesture Recognition (FG 2021) pp. 1 - 8
Main Authors	Ham, Soomin, Park, Kibaek, Jang, YeongJun, Oh, Youngtaek, Yun, Seokmin, Yoon, Sukwon, Kim, Chang Jo, Park, Han-Mu, Kweon, In So
Format	Conference Proceeding
Language	English
Published	IEEE 15.12.2021
Subjects	Face recognition Gesture recognition Neural networks Prototypes Three-dimensional displays Training Training data
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Many advancements in computer vision and machine learning have shown potential for significantly improving the lives of people with disabilities. In particular, recent research has demonstrated that deep neural network models could be used to bridge the gap between the deaf who use sign language and hearing people. The major impediment to advancing such models is the lack of high-quality and large-scale training data. Moreover, previously released sign language datasets include few or no interrogative sentences compared to declarative sentences. In this paper, we introduce a new publicly available large-scale Korean Sign Language (KSL) dataset-KSL-Guide-that includes both declarative sentences and comparable interrogative sentences, which are required for a model to achieve high performance in real-world interactive tasks deployed on service applications. Our dataset contains a total of 121K sign language video samples featuring sentences and words spoken by native KSL speakers with extensive annotations (e.g., gloss, translation, keypoints, and timestamps). We exploit a multi-camera system to produce 3D human pose keypoints as well as 2D keypoints from multi-view RGB. Our experiments quantitatively demonstrate that the inclusion of interrogative sentences in training for sign language recognition and translation tasks greatly improves their performance. Furthermore, we empirically show the qualitative results by developing a prototype application using our dataset, providing an interactive guide service that helps to lower the communication barrier between sign language speakers and hearing people.
DOI:	10.1109/FG52635.2021.9667011