A comprehensive multimodal dataset for contactless lip reading and acoustic analysis

Small-scale motion detection using non-invasive remote sensing techniques has recently garnered significant interest in the field of speech recognition. Our dataset paper aims to facilitate the enhancement and restoration of speech information from diverse data sources for speakers. In this paper, w...

Full description

Saved in:

Bibliographic Details
Published in	Scientific data Vol. 10; no. 1; pp. 895 - 17
Main Authors	Ge, Yao, Tang, Chong, Li, Haobo, Chen, Zikang, Wang, Jingyan, Li, Wenda, Cooper, Jonathan, Chetty, Kevin, Faccio, Daniele, Imran, Muhammad, Abbasi, Qammer H.
Format	Journal Article
Language	English
Published	London Nature Publishing Group UK 13.12.2023 Nature Publishing Group Nature Portfolio
Subjects	639/166/985 639/166/987 639/766/930/1032 Data Descriptor Datasets Humanities and Social Sciences Lip Lipreading Motion detection multidisciplinary Remote sensing Science Science (multidisciplinary) Speech Speech recognition Voice recognition
Online Access	Get full text

Cover

Loading…

More Information
Summary:	Small-scale motion detection using non-invasive remote sensing techniques has recently garnered significant interest in the field of speech recognition. Our dataset paper aims to facilitate the enhancement and restoration of speech information from diverse data sources for speakers. In this paper, we introduce a novel multimodal dataset based on Radio Frequency, visual, text, audio, laser and lip landmark information, also called RVTALL. Specifically, the dataset consists of 7.5 GHz Channel Impulse Response (CIR) data from ultra-wideband (UWB) radars, 77 GHz frequency modulated continuous wave (FMCW) data from millimeter wave (mmWave) radar, visual and audio information, lip landmarks and laser data, offering a unique multimodal approach to speech recognition research. Meanwhile, a depth camera is adopted to record the landmarks of the subject’s lip and voice. Approximately 400 minutes of annotated speech profiles are provided, which are collected from 20 participants speaking 5 vowels, 15 words, and 16 sentences. The dataset has been validated and has potential for the investigation of lip reading and multimodal speech recognition.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Undefined-1 ObjectType-Feature-3 content type line 23
ISSN:	2052-4463 2052-4463
DOI:	10.1038/s41597-023-02793-w