Linguistic Resources for Meeting Speech Recognition
This paper describes efforts by the University of Pennsylvania’s Linguistic Data Consortium to create and distribute shared linguistic resources – including data, annotations, tools and infrastructure – to support the Rich Transcription 2005 Spring Meeting Recognition Evaluation. In addition to dist...
Saved in:
Published in | Machine Learning for Multimodal Interaction pp. 390 - 401 |
---|---|
Main Authors | , |
Format | Book Chapter |
Language | English |
Published |
Berlin, Heidelberg
Springer Berlin Heidelberg
2006
|
Series | Lecture Notes in Computer Science |
Subjects | |
Online Access | Get full text |
ISBN | 3540325492 9783540325499 |
ISSN | 0302-9743 1611-3349 |
DOI | 10.1007/11677482_33 |
Cover
Summary: | This paper describes efforts by the University of Pennsylvania’s Linguistic Data Consortium to create and distribute shared linguistic resources – including data, annotations, tools and infrastructure – to support the Rich Transcription 2005 Spring Meeting Recognition Evaluation. In addition to distributing large volumes of training data, LDC produced reference transcripts for the RT-05S conference room evaluation corpus, which represents a variety of subjects, scenarios and recording conditions. Careful verbatim reference transcripts including rich markup were created for all two hours of data. One hour was also selected for a contrastive study using a quick transcription methodology. We review the two methodologies and discuss qualitative differences in the resulting transcripts. Finally, we describe infrastructure development including transcription tools to support our efforts. |
---|---|
ISBN: | 3540325492 9783540325499 |
ISSN: | 0302-9743 1611-3349 |
DOI: | 10.1007/11677482_33 |