Hallucinated n-best lists for discriminative language modeling

This paper investigates semi-supervised methods for discriminative language modeling, whereby n-best lists are "hallucinated" for given reference text and are then used for training n-gram language models using the perceptron algorithm. We perform controlled experiments on a very strong ba...

Full description

Saved in:
Bibliographic Details
Published in2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) pp. 5001 - 5004
Main Authors Sagae, K., Lehr, M., Prud'hommeaux, E., Xu, P., Glenn, N., Karakos, D., Khudanpur, S., Roark, B., Saraclar, M., Shafran, I., Bikel, D., Callison-Burch, C., Cao, Y., Hall, K., Hasler, E., Koehn, P., Lopez, A., Post, M., Riley, D.
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.03.2012
Subjects
Online AccessGet full text

Cover

Loading…
Abstract This paper investigates semi-supervised methods for discriminative language modeling, whereby n-best lists are "hallucinated" for given reference text and are then used for training n-gram language models using the perceptron algorithm. We perform controlled experiments on a very strong baseline English CTS system, comparing three methods for simulating ASR output, and compare the results with training with "real" n-best list output from the baseline recognizer. We find that methods based on extracting phrasal cohorts - similar to methods from machine translation for extracting phrase tables - yielded the largest gains of our three methods, achieving over half of the WER reduction of the fully supervised methods.
AbstractList This paper investigates semi-supervised methods for discriminative language modeling, whereby n-best lists are "hallucinated" for given reference text and are then used for training n-gram language models using the perceptron algorithm. We perform controlled experiments on a very strong baseline English CTS system, comparing three methods for simulating ASR output, and compare the results with training with "real" n-best list output from the baseline recognizer. We find that methods based on extracting phrasal cohorts - similar to methods from machine translation for extracting phrase tables - yielded the largest gains of our three methods, achieving over half of the WER reduction of the fully supervised methods.
Author Xu, P.
Cao, Y.
Karakos, D.
Koehn, P.
Prud'hommeaux, E.
Khudanpur, S.
Shafran, I.
Hasler, E.
Lopez, A.
Glenn, N.
Hall, K.
Bikel, D.
Post, M.
Saraclar, M.
Roark, B.
Riley, D.
Sagae, K.
Lehr, M.
Callison-Burch, C.
Author_xml – sequence: 1
  givenname: K.
  surname: Sagae
  fullname: Sagae, K.
– sequence: 2
  givenname: M.
  surname: Lehr
  fullname: Lehr, M.
– sequence: 3
  givenname: E.
  surname: Prud'hommeaux
  fullname: Prud'hommeaux, E.
– sequence: 4
  givenname: P.
  surname: Xu
  fullname: Xu, P.
– sequence: 5
  givenname: N.
  surname: Glenn
  fullname: Glenn, N.
– sequence: 6
  givenname: D.
  surname: Karakos
  fullname: Karakos, D.
– sequence: 7
  givenname: S.
  surname: Khudanpur
  fullname: Khudanpur, S.
– sequence: 8
  givenname: B.
  surname: Roark
  fullname: Roark, B.
– sequence: 9
  givenname: M.
  surname: Saraclar
  fullname: Saraclar, M.
– sequence: 10
  givenname: I.
  surname: Shafran
  fullname: Shafran, I.
– sequence: 11
  givenname: D.
  surname: Bikel
  fullname: Bikel, D.
– sequence: 12
  givenname: C.
  surname: Callison-Burch
  fullname: Callison-Burch, C.
– sequence: 13
  givenname: Y.
  surname: Cao
  fullname: Cao, Y.
– sequence: 14
  givenname: K.
  surname: Hall
  fullname: Hall, K.
– sequence: 15
  givenname: E.
  surname: Hasler
  fullname: Hasler, E.
– sequence: 16
  givenname: P.
  surname: Koehn
  fullname: Koehn, P.
– sequence: 17
  givenname: A.
  surname: Lopez
  fullname: Lopez, A.
– sequence: 18
  givenname: M.
  surname: Post
  fullname: Post, M.
– sequence: 19
  givenname: D.
  surname: Riley
  fullname: Riley, D.
BookMark eNo1kFFLwzAcxKNOsJv9BHvJF0j9J2mS5kWQoU4YKEzBt5E0SYlkrSyd4Le34ryXezh-x3FzNOuH3iO0pFBRCvrmaXW33b5UDCirJGs01PwMlVo1tJaKA9RSn6OCcaUJ1fB-geb_gahnqKCCAZG01leozPkDJk0ocFmg27VJ6djG3oze4Z5Yn0ecYh4zDsMBu5jbQ9z_xvHL42T67mg6j_eD8yn23TW6DCZlX558gd4e7l9Xa7J5fpxGb0hkDEbiGhmYN43ltLbacyUUFSE4xpR22jrurBUgDQ_ciIloW9kKp4BbYG3QwBdo-dcbvfe7z2mSOXzvTlfwH3XUUHY
ContentType Conference Proceeding
DBID 6IE
6IH
CBEJK
RIE
RIO
DOI 10.1109/ICASSP.2012.6289043
DatabaseName IEEE Electronic Library (IEL) Conference Proceedings
IEEE Proceedings Order Plan (POP) 1998-present by volume
IEEE Xplore All Conference Proceedings
IEEE Electronic Library (IEL)
IEEE Proceedings Order Plans (POP) 1998-present
DatabaseTitleList
Database_xml – sequence: 1
  dbid: RIE
  name: IEEE Electronic Library (IEL)
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISBN 9781467300469
1467300446
9781467300445
1467300462
EISSN 2379-190X
EndPage 5004
ExternalDocumentID 6289043
Genre orig-research
GroupedDBID 23M
29P
6IE
6IF
6IH
6IK
6IL
6IM
6IN
AAJGR
ABLEC
ACGFS
ADZIZ
ALMA_UNASSIGNED_HOLDINGS
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CBEJK
CHZPO
IEGSK
IJVOP
IPLJI
JC5
M43
OCL
RIE
RIL
RIO
RNS
ID FETCH-LOGICAL-i220t-d86f2ea8b314b9e375715ffd2279d9bd3dbb506a3f3a5220cc6c5d703b02cf903
IEDL.DBID RIE
ISBN 1467300454
9781467300452
ISSN 1520-6149
IngestDate Wed Jun 26 19:24:16 EDT 2024
IsDoiOpenAccess false
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-i220t-d86f2ea8b314b9e375715ffd2279d9bd3dbb506a3f3a5220cc6c5d703b02cf903
OpenAccessLink https://www.pure.ed.ac.uk/ws/files/18890623/Sagae_Lehr_ET_AL_2012_Hallucinated_n_best_lists_for_discriminative_language_modeling.pdf
PageCount 4
ParticipantIDs ieee_primary_6289043
PublicationCentury 2000
PublicationDate 2012-March
PublicationDateYYYYMMDD 2012-03-01
PublicationDate_xml – month: 03
  year: 2012
  text: 2012-March
PublicationDecade 2010
PublicationTitle 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
PublicationTitleAbbrev ICASSP
PublicationYear 2012
Publisher IEEE
Publisher_xml – name: IEEE
SSID ssj0000781036
ssj0008748
Score 2.0008724
Snippet This paper investigates semi-supervised methods for discriminative language modeling, whereby n-best lists are "hallucinated" for given reference text and are...
SourceID ieee
SourceType Publisher
StartPage 5001
SubjectTerms automatic speech recognition
Data models
discriminative training
Hidden Markov models
language modeling
semi-supervised methods
Speech
Speech recognition
Training
Training data
Transducers
Title Hallucinated n-best lists for discriminative language modeling
URI https://ieeexplore.ieee.org/document/6289043
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV05T8MwFLbaTrBwtIhbHhhx6vpKvCAhRFWQiiqVSt0qn1JFlSJIFn49dpKWQwxscYY4sZ33np_f930AXJHUSa-YQpoNDGLEEKRSmSEuraZMKa9UzEOOn8Roxh7nfN4C11ssjHOuKj5zSbyszvLt2pQxVdYX8VSM0TZoZ5jUWK1tPiWS1uCo3d1Y4SytlLOCe4rbIyYrUJeI7OyMsw3XU9MmDR3RAMv-w93tdDqJNV8kafr7IbxS-Z3hHhhv3rguN3lJykIn5uMXmeN_P2kf9L4QfnCy9V0HoOXyQ7D7jZywC25GarUqzTIP0aiFOdLBf8BVWBXvMAS6MMJ5a0mwaDDhJu8JK2md8IAemA3vn-9GqFFbQEtCcIFsJjxxKtN0wLR0NOXpgHtvI8WgldpSqzXHQlFPVQjasDHCcBsMhsbEeInpEejk69wdAyi5Fy786iH4M0yTLFNx3ye0El4ziv0J6MaRWLzWhBqLZhBO_759BnbibNSFX-egU7yV7iJEAoW-rJbAJyEBq5g
link.rule.ids 310,311,786,790,795,796,802,27958,55109
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV07T8MwELZKGYCFR4t444GRtK7tOPGChCpQCm1Vqa3UrfJTqqhSBMnCr8dO0vIQA1ucwU4c5-58vu_7ALjBkeFWUBFI2lEBxQoHIuJxEHItCRXCCuHzkIMhS6b0aRbOauB2g4UxxhTFZ6blL4uzfL1SuU-VtZk_FaNkC2w7P494idbaZFQ8bQ3y6t2VHY6jQjvLOSi_QaK8gHUxz89OQ7pme6rauCIkcr22e9378Xjkq75wqxrxh_RK4Xke98Fg_cxlwclLK89kS338onP870sdgOYXxg-ONt7rENRMegT2vtETNsBdIpbLXC1SF49qmAbSeRC4dOviHbpQF3pAbykK5k0mXGc-YSGu4zpogunjw6SbBJXeQrDAGGWBjpnFRsSSdKjkhkRh1Amt1Z5kUHOpiZYyREwQS4QL25BSTIXamQyJsLIckWNQT1epOQGQh5YZ97O78E9RieNY-J0fk4JZSQmyp6DhZ2L-WlJqzKtJOPv79jXYSSaD_rzfGz6fg13_ZcoysAtQz95yc-nigkxeFcvhE2mHru4
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=proceeding&rft.title=2012+IEEE+International+Conference+on+Acoustics%2C+Speech+and+Signal+Processing+%28ICASSP%29&rft.atitle=Hallucinated+n-best+lists+for+discriminative+language+modeling&rft.au=Sagae%2C+K.&rft.au=Lehr%2C+M.&rft.au=Prud%27hommeaux%2C+E.&rft.au=Xu%2C+P.&rft.date=2012-03-01&rft.pub=IEEE&rft.isbn=9781467300452&rft.issn=1520-6149&rft.eissn=2379-190X&rft.spage=5001&rft.epage=5004&rft_id=info:doi/10.1109%2FICASSP.2012.6289043&rft.externalDocID=6289043
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1520-6149&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1520-6149&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1520-6149&client=summon