Linguistic Resources for Meeting Speech Recognition

This paper describes efforts by the University of Pennsylvania’s Linguistic Data Consortium to create and distribute shared linguistic resources – including data, annotations, tools and infrastructure – to support the Rich Transcription 2005 Spring Meeting Recognition Evaluation. In addition to dist...

Full description

Saved in:

Bibliographic Details
Published in	Machine Learning for Multimodal Interaction pp. 390 - 401
Main Authors	Glenn, Meghan Lammie, Strassel, Stephanie
Format	Book Chapter
Language	English
Published	Berlin, Heidelberg Springer Berlin Heidelberg 2006
Series	Lecture Notes in Computer Science
Subjects	Audio Signal Broadcast News Human Language Technology Linguistic Resource Segment Boundary
Online Access	Get full text
ISBN	3540325492 9783540325499
ISSN	0302-9743 1611-3349
DOI	10.1007/11677482_33

Cover

More Information
Summary:	This paper describes efforts by the University of Pennsylvania’s Linguistic Data Consortium to create and distribute shared linguistic resources – including data, annotations, tools and infrastructure – to support the Rich Transcription 2005 Spring Meeting Recognition Evaluation. In addition to distributing large volumes of training data, LDC produced reference transcripts for the RT-05S conference room evaluation corpus, which represents a variety of subjects, scenarios and recording conditions. Careful verbatim reference transcripts including rich markup were created for all two hours of data. One hour was also selected for a contrastive study using a quick transcription methodology. We review the two methodologies and discuss qualitative differences in the resulting transcripts. Finally, we describe infrastructure development including transcription tools to support our efforts.
ISBN:	3540325492 9783540325499
ISSN:	0302-9743 1611-3349
DOI:	10.1007/11677482_33