Linguistic Resources for Meeting Speech Recognition

This paper describes efforts by the University of Pennsylvania’s Linguistic Data Consortium to create and distribute shared linguistic resources – including data, annotations, tools and infrastructure – to support the Rich Transcription 2005 Spring Meeting Recognition Evaluation. In addition to dist...

Full description

Saved in:
Bibliographic Details
Published inMachine Learning for Multimodal Interaction pp. 390 - 401
Main Authors Glenn, Meghan Lammie, Strassel, Stephanie
Format Book Chapter
LanguageEnglish
Published Berlin, Heidelberg Springer Berlin Heidelberg 2006
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN3540325492
9783540325499
ISSN0302-9743
1611-3349
DOI10.1007/11677482_33

Cover

More Information
Summary:This paper describes efforts by the University of Pennsylvania’s Linguistic Data Consortium to create and distribute shared linguistic resources – including data, annotations, tools and infrastructure – to support the Rich Transcription 2005 Spring Meeting Recognition Evaluation. In addition to distributing large volumes of training data, LDC produced reference transcripts for the RT-05S conference room evaluation corpus, which represents a variety of subjects, scenarios and recording conditions. Careful verbatim reference transcripts including rich markup were created for all two hours of data. One hour was also selected for a contrastive study using a quick transcription methodology. We review the two methodologies and discuss qualitative differences in the resulting transcripts. Finally, we describe infrastructure development including transcription tools to support our efforts.
ISBN:3540325492
9783540325499
ISSN:0302-9743
1611-3349
DOI:10.1007/11677482_33