The IBM 2006 Gale Arabic ASR System

This paper describes the advances made in IBM's Arabic broadcast news transcription system which was fielded in the 2006 GALE ASR and machine translation evaluation. These advances were instrumental in lowering the word error rate by 42% relative over the course of one year and include: trainin...

Full description

Saved in:
Bibliographic Details
Published in2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07 Vol. 4; pp. IV-349 - IV-352
Main Authors Soltau, H., Saon, G., Kingsbury, B., Kuo, J., Mangu, L., Povey, D., Zweig, G.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.04.2007
Subjects
Online AccessGet full text

Cover

Loading…
More Information
Summary:This paper describes the advances made in IBM's Arabic broadcast news transcription system which was fielded in the 2006 GALE ASR and machine translation evaluation. These advances were instrumental in lowering the word error rate by 42% relative over the course of one year and include: training on additional LDC data, large-scale discriminative training on 1800 hours of unsupervised data, automatic vowelization using a flat-start approach, use of a large vocabulary with 617K words and 2 million pronunciations and lastly, a system architecture based on cross-adaptation between unvowelized and vowelized acoustic models.
ISBN:9781424407279
1424407273
ISSN:1520-6149
2379-190X
DOI:10.1109/ICASSP.2007.366921