Visual Lip Reading Dataset in Turkish

The promised dataset was obtained from daily Turkish words and phrases pronounced by various people in videos posted on YouTube. The purpose of compiling the dataset was to provide a method for the detection of the spoken word by recognizing patterns or classifying lip movements with supervised, uns...

Full description

Saved in:
Bibliographic Details
Published inData (Basel) Vol. 8; no. 1; p. 15
Main Authors Berkol, Ali, Tümer-Sivri, Talya, Pervan-Akman, Nergis, Çolak, Melike, Erdem, Hamit
Format Journal Article
LanguageEnglish
Published Basel MDPI AG 01.01.2023
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:The promised dataset was obtained from daily Turkish words and phrases pronounced by various people in videos posted on YouTube. The purpose of compiling the dataset was to provide a method for the detection of the spoken word by recognizing patterns or classifying lip movements with supervised, unsupervised, and semi-supervised learning, and machine learning algorithms. Most of the datasets related to lip reading consist of people recorded on camera with fixed backgrounds and the same conditions, but the dataset presented here consists of images compatible with machine learning models developed for real-life challenges. It contains a total of 2335 instances taken from TV series, movies, vlogs, and song clips on YouTube. The images in the dataset vary due to factors such as the way people say words, accents, speaking rate, gender, and age. Furthermore, the instances in the dataset consist of videos with different angles, shadows, resolution, and brightness that are not created manually. The most important feature of our lip reading dataset is that we contribute to the non-synthetic Turkish dataset pool, which does not have wide dataset varieties. Machine learning studies can be carried out in many areas, such as education, security, and social life with this dataset.
ISSN:2306-5729
2306-5729
DOI:10.3390/data8010015