The CHIL Audiovisual Corpus for Lecture and Meeting Analysis inside Smart Rooms

The analysis of lectures and meetings inside smart rooms has recently attracted much interest in the literature, being the focus of international projects and technology evaluations. A key enabler for progress in this area is the availability of appropriate multimodal and multi-sensory corpora, anno...

Full description

Saved in:

Bibliographic Details
Published in	Language Resources and Evaluation Vol. 41; no. 3/4; pp. 389 - 407
Main Authors	Mostefa, Djamel, Moreau, Nicolas, Choukri, Khalid, Potamianos, Gerasimos, Chu, Stephen M., Tyagi, Ambrish, Casas, Josep R., Turmo, Jordi, Cristoforetti, Luca, Tobia, Francesco, Pnevmatikakis, Aristodemos, Mylonakis, Vassilis, Talantzis, Fotios, Burger, Susanne, Stiefelhagen, Rainer, Bernardin, Keni, Rochet, Cedrick
Format	Journal Article
Language	English
Published	Dordrect Springer 01.12.2007 Springer Nature B.V
Subjects	Annotations Cameras Communication research Computerized corpora Consortia Corpus analysis Datasets Human behavior Interpersonal communication Lectures Meetings Microphones Multimedia communications Orthography Product labeling Recording Sensors Social interaction Space Technological change Technology Technology assessment Transcription Video data Websites United States > US Italy Germany
Online Access	Get full text

Cover

Loading…

More Information
Summary:	The analysis of lectures and meetings inside smart rooms has recently attracted much interest in the literature, being the focus of international projects and technology evaluations. A key enabler for progress in this area is the availability of appropriate multimodal and multi-sensory corpora, annotated with rich human activity information during lectures and meetings. This paper is devoted to exactly such a corpus, developed in the framework of the European project CHIL, "Computers in the Human Interaction Loop". The resulting data set has the potential to drastically advance the state-of-the-art, by proving numerous synchronized audio and video streams of real lectures and meetings, captured in multiple recording sites over the past 4 years. It particulary overcomes typical shortcomings of other existing databases that may contain limited sensory or monomodal data, exhibit constrained human behavior and interaction patterns, or lack data variability. The CHIL corpus is accompanied by rich mantual annotations of both its audio and visual modalities. These provide a detailed multi-channel verbatim orthographic transcription that includes speaker turns and identities, acoustic condition information, and named entities, as well as video labels in multiple of camera views that provide multi-person 3D head and 2D facial feature location information. Over the past 3 years, the corpus has been crucial to the evaluation of a multitude of audiovisual perception technologies for human activity analysis in lecture and meeting scenarios, demonstrating its utility during internal evaluations of the CHIL consortium, as well as at the recent international CLEAR and Rich Transcription evaluations. The CHIL corpus is publicly available to the research community.
Bibliography:	ObjectType-Article-2 SourceType-Scholarly Journals-1 ObjectType-Feature-1 content type line 23 ObjectType-Article-1 ObjectType-Feature-2
ISSN:	1574-020X 1572-8412 1574-0218
DOI:	10.1007/s10579-007-9054-4